ZeroLeaks
Shield SDKProvider Wrappers

Anthropic Provider

Wrap your Anthropic client with Shield protection for automatic prompt hardening, injection detection, and output sanitization.

Anthropic Provider

The shieldAnthropic wrapper adds transparent security to your existing Anthropic client. It intercepts every messages.create call to harden the system prompt, detect injections in user messages, and sanitize leaked content from responses.

Usage

import Anthropic from "@anthropic-ai/sdk";
import { shieldAnthropic } from "@zeroleaks/shield/anthropic";

const client = shieldAnthropic(new Anthropic(), {
  systemPrompt: "You are a support agent...",
});

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  system: "You are a support agent...",
  messages: [{ role: "user", content: userInput }],
  max_tokens: 1024,
});

How It Works

On every call to messages.create, Shield:

  1. Clones the params object (never mutates your original)
  2. Hardens the system field if it is a string (unless harden: false)
  3. Scans every user message for injection patterns, supporting both string content and content block arrays (unless detect: false)
  4. Calls the original Anthropic API
  5. Sanitizes the first text block in the response for leaked system prompt fragments (unless sanitize: false)

Options

OptionTypeDefaultDescription
systemPromptstringThe system prompt to protect (used for output sanitization)
hardenHardenOptions | false{}Hardening options, or false to disable
detectDetectOptions | false{}Detection options, or false to disable
sanitizeSanitizeOptions | false{}Sanitization options, or false to disable
streamingSanitize"buffer" | "chunked" | "passthrough""buffer""buffer": full buffer. "chunked": 8KB chunks. "passthrough": skip sanitization.
streamingChunkSizenumber8192Chunk size for "chunked" mode
throwOnLeakbooleanfalseWhen true, throw LeakDetectedError instead of redacting
onDetection"block" | "warn""block""block" throws an error, "warn" calls the callback only
onInjectionDetected(result) => voidCallback when injection is detected
onLeakDetected(result) => voidCallback when output leak is detected

Content Block Support

Anthropic messages can contain content blocks (arrays of {type, text} objects) instead of plain strings. Shield extracts text from these blocks for detection:

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  system: "You are a support agent...",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Help me with this document:" },
        { type: "text", text: documentContent },
      ],
    },
  ],
  max_tokens: 1024,
});

Both text blocks are scanned for injection patterns.

On this page