Skip to main content

Cartesia STT

The Cartesia STT provider enables your agent to use Cartesia's advanced speech-to-text models for high-accuracy, real-time audio transcription.

Installation​

Install the Cartesia-enabled VideoSDK Agents package:

pip install "videosdk-plugins-cartesia"

Importing​

from videosdk.plugins.cartesia import CartesiaSTT

Authentication​

The Cartesia plugin requires a Cartesia API key.

Set CARTESIA_API_KEY in your .env file.

Example Usage​

from videosdk.plugins.cartesia import CartesiaSTT
from videosdk.agents import CascadingPipeline

# Initialize the Cartesia STT model
stt = CartesiaSTT(
# When CARTESIA_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-cartesia-api-key",
language="en-US",
model="ink-whisper",
)

# Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)
note

When using an environment variable for credentials, don't pass the api_key as an argument to the model instance. The SDK automatically reads the environment variable.

Configuration Options​

  • api_key: (str) Your Cartesia API key. Can also be set via the CARTESIA_API_KEY environment variable.
  • model: (str) The Cartesia STT model to use (e.g., "ink-whisper"). Defaults to "ink-whisper".
  • language: (str) Language code for transcription (default: "en").

Additional resources​

The following resources provide more information about using Cartesia with VideoSDK Agents.

Got a Question? Ask us on discord