Anthropic LLM
The Anthropic AI LLM provider enables your agent to use Anthropic AI's language models for text-based conversations and processing. It also supports vision input capabilities, allowing your agent to analyze and respond to images alongside text with the supported models.
Installation
Install the Anthropic-enabled VideoSDK Agents package:
pip install "videosdk-plugins-anthropic"
Importing
from videosdk.plugins.anthropic import AnthropicLLM
Authentication
The Anthropic plugin requires an Anthropic API key.
Set ANTHROPIC_API_KEY in your .env file.
Example Usage
from videosdk.plugins.anthropic import AnthropicLLM
from videosdk.agents import Pipeline
llm = AnthropicLLM(
model="claude-sonnet-4-20250514",
temperature=0.7,
max_tokens=1024,
)
pipeline = Pipeline(llm=llm)
When using a .env file for credentials, don't pass them as arguments to model instances. The SDK automatically reads environment variables, so omit api_key and other credential parameters from your code.
Configuration Options
Core
model— The Claude model to use. Default:"claude-sonnet-4-20250514".api_key— Your Anthropic API key. Falls back to theANTHROPIC_API_KEYenvironment variable.base_url— Optional custom base URL for the Claude API. Default:None.temperature— Sampling temperature. Default:0.7.tool_choice— Tool selection mode:"auto","required","none", or a dict{"type": "function", "function": {"name": "my_tool"}}to force a specific tool. Default:"auto".max_tokens— Maximum tokens in the response. Default:1024.top_p— Nucleus sampling probability mass (float, optional).top_k— Top-k sampling parameter (int, optional).
Tool calling
parallel_tool_calls— Allow (True) or disallow (False) Claude from issuing multiple tool calls in a single response turn. WhenFalse, Claude is forced to call tools one at a time (optional).
Prompt caching
caching— Set to"ephemeral"to enable Anthropic prompt caching. When enabled, the SDK automatically applies cache markers to the system prompt, tool schemas, and the most recent conversation turns. Cache hits reduce input token costs. Default:None(disabled).
Extended thinking
thinking— Dict that enables extended thinking (Claude's internal reasoning pass before answering). Example:{"type": "enabled", "budget_tokens": 4096}. When set,max_tokensmust be greater thanbudget_tokens. Default:None(disabled).
Prompt Caching
Prompt caching reduces costs when the same system prompt, tool schemas, or recent conversation turns are reused across requests. Set caching="ephemeral" to let the SDK handle marker placement automatically.
from videosdk.plugins.anthropic import AnthropicLLM
from videosdk.agents import Pipeline
llm = AnthropicLLM(
model="claude-sonnet-4-20250514",
temperature=0.7,
max_tokens=1024,
caching="ephemeral", # cache system prompt + tools + last turns
parallel_tool_calls=True, # let Claude call multiple tools at once
)
pipeline = Pipeline(llm=llm)
When caching is active, LLMResponse.metadata["usage"] will include two additional keys:
cache_creation_tokens— tokens written to the cache on this request (charged at a higher rate, amortised over future hits)cache_read_tokens— tokens read from the cache on this request (charged at a lower rate)
Prompt caching requires a minimum cacheable block size (1024 tokens for Sonnet/Opus, 2048 for Haiku). Very short system prompts or tool lists may not qualify for caching.
Extended Thinking
Extended thinking gives Claude additional reasoning time before producing its final answer. This can improve accuracy on complex multi-step tasks.
from videosdk.plugins.anthropic import AnthropicLLM
from videosdk.agents import Pipeline
llm = AnthropicLLM(
model="claude-sonnet-4-20250514",
thinking={"type": "enabled", "budget_tokens": 4096},
max_tokens=8192, # must be greater than budget_tokens
)
pipeline = Pipeline(llm=llm)
max_tokens must be strictly greater than budget_tokens inside the thinking dict. If budget_tokens is 4096, set max_tokens to at least 4097 (typically much higher to leave room for the final answer).
Additional Resources
- Anthropic docs: Anthropic documentation.
Got a Question? Ask us on discord

