Cartesia STT

The Cartesia STT provider enables your agent to use Cartesia's advanced speech-to-text models for high-accuracy, real-time audio transcription.

Installation

Install the Cartesia-enabled VideoSDK Agents package:

pip install "videosdk-plugins-cartesia"

Importing

from videosdk.plugins.cartesia import CartesiaSTT

Authentication

The Cartesia plugin requires a Cartesia API key.

Set CARTESIA_API_KEY in your .env file.

Example Usage

from videosdk.plugins.cartesia import CartesiaSTT
from videosdk.agents import CascadingPipeline

# Initialize the Cartesia STT model
stt = CartesiaSTT(
    # When CARTESIA_API_KEY is set in .env - DON'T pass api_key parameter
    api_key="your-cartesia-api-key",
    language="en-US",
    model="ink-whisper",
)

#  Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)

note

When using an environment variable for credentials, don't pass the api_key as an argument to the model instance. The SDK automatically reads the environment variable.

Configuration Options

api_key: (str) Your Cartesia API key. Can also be set via the CARTESIA_API_KEY environment variable.
model: (str) The Cartesia STT model to use (e.g., "ink-whisper"). Defaults to "ink-whisper".
language: (str) Language code for transcription (default: "en").

Additional resources

The following resources provide more information about using Cartesia with VideoSDK Agents.

Python package: The videosdk-plugins-cartesia package on PyPI.
GitHub repo: View the source or contribute to the VideoSDK Cartesia STT plugin.
Cartesia docs: Cartesia STT docs.

Got a Question? Ask us on discord

Installation​

Importing​

Authentication​

Example Usage​

Configuration Options​

Additional resources​