Yechao Zhang - Academic Profile

About Me

I am currently a Research Fellow at Nanyang Technological University (NTU), working with Prof. Tianwei Zhang. I received my Ph.D. from Huazhong University of Science and Technology (HUST).

Research Interests

I study the security and safety of real-world AI agent systems. My interests span the full stack of agent development and deployment, from models' internal safety behavior to the external controls that govern their actions. I organize this agenda around three directions:

Model Alignment

Building a model's internal alignment, so that acting on human intent comes from its own reasoning and training (e.g., agentic RL, OPD).

Agent Oversight

Enforcing external control over an agent through its harness, overseeing its reasoning and actions (e.g., CoT monitoring, sandboxing).

Red Teaming

Finding where agents fail before attackers do, by adversarially stress-testing real-world systems to surface their failure modes before deployment.

News

Jul 2026 MemGhost is now online, studying stealth memory injection in persistent personal agents (OpenClaw Hermes).

May 2026 VideoSEAL is now online and accepted to ICML 2026, studying evidence misalignment as a reward-hacking failure in agentic RL for long-video understanding.

Apr 2026 Two papers were accepted to ICML 2026. Congrats to Chenhao and Jianrong!

Apr 2026 ReasoningBomb was accepted to ACM CCS 2026 in the first cycle. Congrats to Xiaogeng!

Mar 2026 📄 Our paper on how OpenClaw heartbeat mechanisms inherently enable silent memory pollution is now online!

Feb 2026 ReasoningBomb is out! Check out our work on reasoning vulnerabilities.

Nov 2025 Two papers accepted to AAAI 2026: one in the main track and one in the AI for Social Impact (AISI) track.

Sep 2025 Started as Research Fellow at Nanyang Technological University (NTU), Singapore.

Jun 2025 Graduated from HUST with Outstanding Doctoral Graduate award.

Selected Publications

^† Corresponding author

When Claws Remember but Do Not Tell: Stealthy Memory Injection in Persistent Personal Agents

Yechao Zhang, Shiqian Zhao, Jiawen Zhang, Jie Zhang, Gelei Deng, Xiaogeng Liu, Chaowei Xiao, Tianwei Zhang

arXiv 2026

This paper studies stealth memory injection, where a remote black-box adversary can use a single email payload to make persistent personal agents save poisoned memory while keeping the user-facing response quiet. It introduces WhisperBench, a 108-case full-cycle benchmark, and MemGhost, a one-shot payload generation framework for testing how poisoned memory later steers agent behavior.

PDF

VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority

Chenhao Qiu, Yechao Zhang^†, Xin Luo, Shien Song, Xusheng Liu

International Conference on Machine Learning (ICML 2026)

VideoSEAL asks why long-video agents get more accurate without getting more grounded: a reward-hacking failure in multi-turn agentic RL. Under outcome-only GRPO, training-set accuracy improves while evidence-seeking behavior does not, because credit can flow to answer shortcuts rather than exploration actions. We trace this to two pressures: (1) reward pressure during training, where outcome-only rewards encourage the agent to speculate from insufficient evidence rather than reinforce evidence-seeking actions; and (2) prompt pressure at inference, where longer search traces saturate context and push planners toward speculative commitment instead of verification. The fix decouples an exploring planner from a frozen inspector with answer authority and abstention, enabling search-budget scaling and reducing semantic hallucination from 62.1% to 11.3% on LVBench.

PDF Code

Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution

Yechao Zhang, Shiqian Zhao, Jie Zhang, Gelei Deng, Jiawen Zhang, Xiaogeng Liu, Chaowei Xiao, Tianwei Zhang

arXiv 2026

This work shows that persistent personal agents like OpenClaw can suffer unintended memory pollution even without prompt injection: both user-attended foreground tasks and unattended background tasks may absorb ordinary external content into persistent memory. Because user-facing conversations and noisy tool-call results share the same session context, such content can lose provenance, be saved into long-term memory even without clear user awareness, and later steer user-facing behavior.

PDF

Secure Transfer Learning: Training Clean Model Against Backdoor in Pre-trained Encoder and Downstream Dataset

Yechao Zhang, Yuxuan Zhou, Tianyu Li, Minghui Li, Shengshan Hu, Wei Luo, Leo Yu Zhang

IEEE Symposium on Security and Privacy (Oakland'25)

This work studies how to train clean models when both pre-trained models and fine-tuning datasets may contain unknown backdoor poisoning.

PDF

Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling

Minghui Li, Hao Zhang, Yechao Zhang^†, Wei Wan, Shengshan Hu, Pei Xiaobing, Jing Wang

Empirical Methods in Natural Language Processing (EMNLP'25 Main)

This work proposes an activation-guided framework to generate transferable prompt injection attacks against LLMs using gradient-free optimization.

PDF

Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability

Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, Hai Jin

IEEE Symposium on Security and Privacy (Oakland'24)

This work investigates why mildly robust models generate more transferable adversarial examples than both naturally trained and highly robust models.

PDF Code

Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics

Xiaoxing Mo*, Yechao Zhang*, Leo Yu Zhang, Wei Luo, Nan Sun, Shengshan Hu, Shang Gao, Yang Xiang

IEEE Symposium on Security and Privacy (Oakland'24)

This work proposes a backdoor detection method based on topological evolution dynamics that is effective against both traditional and advanced backdoor attacks.

PDF

Improving Generalization of Universal Adversarial Perturbation via Dynamic Maximin Optimization

Yechao Zhang, Yingzhe Xu, Junyu Shi, Leo Yu Zhang, Shengshan Hu, Minghui Li, Yanjun Zhang

AAAI 2025

This work proposes a dynamic maximin optimization framework to improve the generalization of universal adversarial perturbations across models and samples.

PDF Code

...and more. See my Google Scholar for the full list.

Experience

Sep 2024 - Dec 2024

Ant Group, Security Department

Research Intern

Apr 2024 - Aug 2024

Tencent AI Lab

Algorithm Intern