ZeroLeaks Documentation
AI red-teaming platform for testing system prompt extraction and injection vulnerabilities.
ZeroLeaks Documentation
ZeroLeaks is an AI red-teaming platform that tests how well your AI systems protect their configuration. It uses a multi-agent architecture based on TAP (Tree of Attacks with Pruning) methodology to systematically probe for system prompt extraction and injection vulnerabilities.
What you can do
Run your first scan
Test a system prompt for extraction and injection vulnerabilities in under 5 minutes.
Scan types
Full, extraction, and injection scans -- all running in sandbox mode with tool execution testing.
AgentGuard
Test deployed AI agents for tool hijacking, multi-turn grooming, and data leakage.
zeroleaks package
Open-source scanner with multi-agent TAP architecture, 100+ probes, and CLI.
Shield SDK
Add runtime prompt security to any LLM application with a single npm package.
Architecture overview
ZeroLeaks uses multiple specialized AI agents that coordinate attacks against your system:
- Strategist selects the attack strategy based on target analysis
- Attacker generates attack prompts across 19 categories
- Evaluator analyzes target responses for information leakage
- Mutator refines attacks based on evaluation feedback
Each scan runs 30 adaptive turns with automatic conversation resets, category rotation, and Best-of-N prompt mutations. The result is a security score (0-100), vulnerability classification, and actionable hardening recommendations.