About Me
I am currently a Research Fellow at Nanyang Technological University (NTU), working with Prof. Tianwei Zhang. I received my Ph.D. from Huazhong University of Science and Technology (HUST).
Research Interests
I explore the security boundaries and improve the safety control of AI systems. Currently, I focus on hardening the full stack of agent system development and deployment, from the model's internal intelligence to its interaction with the external environment.
Safe Intelligence
Enhancing models' internal ability to align with safety constraints and enable self-correction (e.g., Agentic RL).
Secure Architecture
Developing systematic frameworks for external control and security models for agent tool access and data interaction.
Red Teaming
Validating system resilience through adversarial testing, prompt injection attacks, and penetration testing.
News
Selected Publications
† Corresponding author
VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority
International Conference on Machine Learning (ICML 2026)
VideoSEAL asks why long-video agents get more accurate without getting more grounded: a reward-hacking failure in multi-turn agentic RL. Under outcome-only GRPO, training-set accuracy improves while evidence-seeking behavior does not, because credit can flow to answer shortcuts rather than exploration actions. We trace this to two pressures: (1) reward pressure during training, where outcome-only rewards encourage the agent to speculate from insufficient evidence rather than reinforce evidence-seeking actions; and (2) prompt pressure at inference, where longer search traces saturate context and push planners toward speculative commitment instead of verification. The fix decouples an exploring planner from a frozen inspector with answer authority and abstention, enabling search-budget scaling and reducing semantic hallucination from 62.1% to 11.3% on LVBench.
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution
arXiv 2026
This work shows that persistent personal agents like OpenClaw can suffer unintended memory pollution even without prompt injection: both user-attended foreground tasks and unattended background tasks may absorb ordinary external content into persistent memory. Because user-facing conversations and noisy tool-call results share the same session context, such content can lose provenance, be saved into long-term memory even without clear user awareness, and later steer user-facing behavior.
Secure Transfer Learning: Training Clean Model Against Backdoor in Pre-trained Encoder and Downstream Dataset
IEEE Symposium on Security and Privacy (Oakland'25)
This work studies how to train clean models when both pre-trained models and fine-tuning datasets may contain unknown backdoor poisoning.
Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling
Empirical Methods in Natural Language Processing (EMNLP'25 Main)
This work proposes an activation-guided framework to generate transferable prompt injection attacks against LLMs using gradient-free optimization.
Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability
IEEE Symposium on Security and Privacy (Oakland'24)
This work investigates why mildly robust models generate more transferable adversarial examples than both naturally trained and highly robust models.
Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics
IEEE Symposium on Security and Privacy (Oakland'24)
This work proposes a backdoor detection method based on topological evolution dynamics that is effective against both traditional and advanced backdoor attacks.
Improving Generalization of Universal Adversarial Perturbation via Dynamic Maximin Optimization
AAAI 2025
This work proposes a dynamic maximin optimization framework to improve the generalization of universal adversarial perturbations across models and samples.
...and more. See my Google Scholar for the full list.
Experience
Huazhong University of Science and Technology
Ph.D. Student, School of Cyber Science and Engineering
Wuhan, China
GPA: 89.99/100
Ant Group, Security Department
Research Intern
Investigated adversarial vulnerabilities in safety-aligned Multimodal LLMs and developed jailbreaking techniques.
Tencent AI Lab
Algorithm Intern
Built a knowledge-enhanced agent and researched RAG poisoning.
Service & Honors
Academic Service
- Reviewer (2025): NeurIPS, ICLR, CVPR, AAAI, ICML, ICCV, ACM MM
- Reviewer (2024): NeurIPS, CVPR, ECCV, ICPR, ACM MM
- Journal Reviewer: IEEE TDSC, IEEE TNNLS, IEEE TIFS
Honors
- Outstanding Doctoral Graduate, HUST (2025)
- China National Scholarship (2021)
- Merit Master Student, HUST (2021)
- Merit Master Student, HUST (2021)