Skip to main content
Version: 1.0.x

Anthropic LLM

The Anthropic AI LLM provider enables your agent to use Anthropic AI's language models for text-based conversations and processing. It also supports vision input capabilities, allowing your agent to analyze and respond to images alongside text with the supported models.

Installation

Install the Anthropic-enabled VideoSDK Agents package:

pip install "videosdk-plugins-anthropic"

Importing

from videosdk.plugins.anthropic import AnthropicLLM

Authentication

The Anthropic plugin requires an Anthropic API key.

Set ANTHROPIC_API_KEY in your .env file.

Example Usage

from videosdk.plugins.anthropic import AnthropicLLM
from videosdk.agents import Pipeline

llm = AnthropicLLM(
model="claude-sonnet-4-20250514",
temperature=0.7,
max_tokens=1024,
)

pipeline = Pipeline(llm=llm)
note

When using a .env file for credentials, don't pass them as arguments to model instances. The SDK automatically reads environment variables, so omit api_key and other credential parameters from your code.

Configuration Options

Core

  • model — The Claude model to use. Default: "claude-sonnet-4-20250514".
  • api_key — Your Anthropic API key. Falls back to the ANTHROPIC_API_KEY environment variable.
  • base_url — Optional custom base URL for the Claude API. Default: None.
  • temperature — Sampling temperature. Default: 0.7.
  • tool_choice — Tool selection mode: "auto", "required", "none", or a dict {"type": "function", "function": {"name": "my_tool"}} to force a specific tool. Default: "auto".
  • max_tokens — Maximum tokens in the response. Default: 1024.
  • top_p — Nucleus sampling probability mass (float, optional).
  • top_k — Top-k sampling parameter (int, optional).

Tool calling

  • parallel_tool_calls — Allow (True) or disallow (False) Claude from issuing multiple tool calls in a single response turn. When False, Claude is forced to call tools one at a time (optional).

Prompt caching

  • caching — Set to "ephemeral" to enable Anthropic prompt caching. When enabled, the SDK automatically applies cache markers to the system prompt, tool schemas, and the most recent conversation turns. Cache hits reduce input token costs. Default: None (disabled).

Extended thinking

  • thinking — Dict that enables extended thinking (Claude's internal reasoning pass before answering). Example: {"type": "enabled", "budget_tokens": 4096}. When set, max_tokens must be greater than budget_tokens. Default: None (disabled).

Prompt Caching

Prompt caching reduces costs when the same system prompt, tool schemas, or recent conversation turns are reused across requests. Set caching="ephemeral" to let the SDK handle marker placement automatically.

from videosdk.plugins.anthropic import AnthropicLLM
from videosdk.agents import Pipeline

llm = AnthropicLLM(
model="claude-sonnet-4-20250514",
temperature=0.7,
max_tokens=1024,
caching="ephemeral", # cache system prompt + tools + last turns
parallel_tool_calls=True, # let Claude call multiple tools at once
)

pipeline = Pipeline(llm=llm)

When caching is active, LLMResponse.metadata["usage"] will include two additional keys:

  • cache_creation_tokens — tokens written to the cache on this request (charged at a higher rate, amortised over future hits)
  • cache_read_tokens — tokens read from the cache on this request (charged at a lower rate)
note

Prompt caching requires a minimum cacheable block size (1024 tokens for Sonnet/Opus, 2048 for Haiku). Very short system prompts or tool lists may not qualify for caching.

Extended Thinking

Extended thinking gives Claude additional reasoning time before producing its final answer. This can improve accuracy on complex multi-step tasks.

from videosdk.plugins.anthropic import AnthropicLLM
from videosdk.agents import Pipeline

llm = AnthropicLLM(
model="claude-sonnet-4-20250514",
thinking={"type": "enabled", "budget_tokens": 4096},
max_tokens=8192, # must be greater than budget_tokens
)

pipeline = Pipeline(llm=llm)
warning

max_tokens must be strictly greater than budget_tokens inside the thinking dict. If budget_tokens is 4096, set max_tokens to at least 4097 (typically much higher to leave room for the final answer).

Additional Resources

Got a Question? Ask us on discord