Gauntlet
Frontier models run the gauntlet of prompt-injection attacks — ranked by how well they resist direct jailbreaks and indirect, tool-based injection. Higher robustness is better.
| Model | Robustness | ASR | Out tok | Avg cost |
|---|---|---|---|---|
| Loading leaderboard… | ||||
Robustness = 100 − attack-success-rate (ASR). The chart plots robustness (%) against efficiency (cost per run or output tokens, reversed so more efficient is to the right); each line connects one model across reasoning efforts. The Indirect track runs hand-built agentic scenarios where a malicious instruction hidden in tool output tries to hijack the model; the Agentic track replays ZeroLeaks' Sandbox attack corpus (~70 probes: tool-hijacking, authority spoofing, protocol exploits) against mock agent tools, counting any dangerous tool call as a breach. Cost uses standard list prices. GPT-5.5 and Grok 4.3 are served via Azure.
