Wrap your Anthropic client with Shield protection for automatic prompt hardening, injection detection, and output sanitization.

Anthropic Provider

The shieldAnthropic wrapper adds transparent security to your existing Anthropic client. It intercepts every messages.create call to harden the system prompt, detect injections in user messages, and sanitize leaked content from responses.

Usage

import Anthropic from "@anthropic-ai/sdk";
import { shieldAnthropic } from "@zeroleaks/shield/anthropic";

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a support agent...",
});

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  system: "You are a support agent...",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
});

How It Works

On every call to messages.create, Shield:

Clones the params object (never mutates your original)
Hardens the system field if it is a string (unless harden: false)
Scans every user message for injection patterns, supporting both string content and content block arrays (unless detect: false)
Calls the original Anthropic API
Sanitizes the first text block in the response for leaked system prompt fragments (unless sanitize: false)

Options

Option	Type	Default	Description
`systemPrompt`	`string`	�	The system prompt to protect (used for output sanitization)
`harden`	`HardenOptions \| false`	`{}`	Hardening options, or `false` to disable
`detect`	`DetectOptions \| false`	`{}`	Detection options, or `false` to disable
`sanitize`	`SanitizeOptions \| false`	`{}`	Sanitization options, or `false` to disable
`streamingSanitize`	`"buffer" \| "chunked" \| "passthrough"`	`"buffer"`	`"buffer"`: full buffer. `"chunked"`: 8KB chunks. `"passthrough"`: skip sanitization.
`streamingChunkSize`	`number`	`8192`	Chunk size for `"chunked"` mode
`throwOnLeak`	`boolean`	`false`	When `true`, throw `LeakDetectedError` instead of redacting
`onDetection`	`"block" \| "warn"`	`"block"`	`"block"` throws an error, `"warn"` calls the callback only
`onInjectionDetected`	`(result) => void`	�	Callback when injection is detected
`onLeakDetected`	`(result) => void`	�	Callback when output leak is detected

Content Block Support

Anthropic messages can contain content blocks (arrays of {type, text} objects) instead of plain strings. Shield extracts text from these blocks for detection:

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  system: "You are a support agent...",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Help me with this document:" },
        { type: "text", text: documentContent },
      ],
    },
  ],
  max_tokens: 1024,
});

Both text blocks are scanned for injection patterns.

Anthropic Provider

Anthropic Provider

Usage

How It Works

Options

Content Block Support

On this page