About Me

I am currently a Research Fellow at Nanyang Technological University (NTU), working with Prof. Tianwei Zhang. I received my Ph.D. from Huazhong University of Science and Technology (HUST).

Research Interests

I explore the security boundaries and improve the safety control of AI systems. Currently, I focus on hardening the full stack of agent system development and deployment, from the model's internal intelligence to its interaction with the external environment.

Safe Intelligence

Enhancing models' internal ability to align with safety constraints and enable self-correction (e.g., Agentic RL).

Secure Architecture

Developing systematic frameworks for external control and security models for agent tool access and data interaction.

Red Teaming

Validating system resilience through adversarial testing, prompt injection attacks, and penetration testing.

News

May 2026 VideoSEAL is now online and accepted to ICML 2026, studying evidence misalignment as a reward-hacking failure in agentic RL for long-video understanding.
Apr 2026 Two papers were accepted to ICML 2026. Congrats to Chenhao and Jianrong!
Apr 2026 ReasoningBomb was accepted to ACM CCS 2026 in the first cycle. Congrats to Xiaogeng!
Mar 2026 📄 Our paper on how OpenClaw heartbeat mechanisms inherently enable silent memory pollution is now online!
Feb 2026 ReasoningBomb is out! Check out our work on reasoning vulnerabilities.
Nov 2025 Two papers accepted to AAAI 2026: one in the main track and one in the AI for Social Impact (AISI) track.
Sep 2025 Started as Research Fellow at Nanyang Technological University (NTU), Singapore.
Jun 2025 Graduated from HUST with Outstanding Doctoral Graduate award.

Selected Publications

Corresponding author

VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority

Chenhao Qiu, Yechao Zhang, Xin Luo, Shien Song, Xusheng Liu

International Conference on Machine Learning (ICML 2026)

VideoSEAL asks why long-video agents get more accurate without getting more grounded: a reward-hacking failure in multi-turn agentic RL. Under outcome-only GRPO, training-set accuracy improves while evidence-seeking behavior does not, because credit can flow to answer shortcuts rather than exploration actions. We trace this to two pressures: (1) reward pressure during training, where outcome-only rewards encourage the agent to speculate from insufficient evidence rather than reinforce evidence-seeking actions; and (2) prompt pressure at inference, where longer search traces saturate context and push planners toward speculative commitment instead of verification. The fix decouples an exploring planner from a frozen inspector with answer authority and abstention, enabling search-budget scaling and reducing semantic hallucination from 62.1% to 11.3% on LVBench.

Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution

Yechao Zhang, Shiqian Zhao, Jie Zhang, Gelei Deng, Jiawen Zhang, Xiaogeng Liu, Chaowei Xiao, Tianwei Zhang

arXiv 2026

This work shows that persistent personal agents like OpenClaw can suffer unintended memory pollution even without prompt injection: both user-attended foreground tasks and unattended background tasks may absorb ordinary external content into persistent memory. Because user-facing conversations and noisy tool-call results share the same session context, such content can lose provenance, be saved into long-term memory even without clear user awareness, and later steer user-facing behavior.

Secure Transfer Learning: Training Clean Model Against Backdoor in Pre-trained Encoder and Downstream Dataset

Yechao Zhang, Yuxuan Zhou, Tianyu Li, Minghui Li, Shengshan Hu, Wei Luo, Leo Yu Zhang

IEEE Symposium on Security and Privacy (Oakland'25)

This work studies how to train clean models when both pre-trained models and fine-tuning datasets may contain unknown backdoor poisoning.

Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling

Minghui Li, Hao Zhang, Yechao Zhang, Wei Wan, Shengshan Hu, Pei Xiaobing, Jing Wang

Empirical Methods in Natural Language Processing (EMNLP'25 Main)

This work proposes an activation-guided framework to generate transferable prompt injection attacks against LLMs using gradient-free optimization.

Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability

Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, Hai Jin

IEEE Symposium on Security and Privacy (Oakland'24)

This work investigates why mildly robust models generate more transferable adversarial examples than both naturally trained and highly robust models.

Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics

Xiaoxing Mo*, Yechao Zhang*, Leo Yu Zhang, Wei Luo, Nan Sun, Shengshan Hu, Shang Gao, Yang Xiang

IEEE Symposium on Security and Privacy (Oakland'24)

This work proposes a backdoor detection method based on topological evolution dynamics that is effective against both traditional and advanced backdoor attacks.

Improving Generalization of Universal Adversarial Perturbation via Dynamic Maximin Optimization

Yechao Zhang, Yingzhe Xu, Junyu Shi, Leo Yu Zhang, Shengshan Hu, Minghui Li, Yanjun Zhang

AAAI 2025

This work proposes a dynamic maximin optimization framework to improve the generalization of universal adversarial perturbations across models and samples.

...and more. See my Google Scholar for the full list.

Experience

2020 - 2025

Huazhong University of Science and Technology

Ph.D. Student, School of Cyber Science and Engineering

Wuhan, China

GPA: 89.99/100

Sep 2024 - Dec 2024

Ant Group, Security Department

Research Intern

Investigated adversarial vulnerabilities in safety-aligned Multimodal LLMs and developed jailbreaking techniques.

Apr 2024 - Aug 2024

Tencent AI Lab

Algorithm Intern

Built a knowledge-enhanced agent and researched RAG poisoning.

Service & Honors

Academic Service

  • Reviewer (2025): NeurIPS, ICLR, CVPR, AAAI, ICML, ICCV, ACM MM
  • Reviewer (2024): NeurIPS, CVPR, ECCV, ICPR, ACM MM
  • Journal Reviewer: IEEE TDSC, IEEE TNNLS, IEEE TIFS

Honors

  • Outstanding Doctoral Graduate, HUST (2025)
  • China National Scholarship (2021)
  • Merit Master Student, HUST (2021)
  • Merit Master Student, HUST (2021)