Cartesia STT
The Cartesia STT provider enables your agent to use Cartesia's advanced speech-to-text models for high-accuracy, real-time audio transcription.
Installation​
Install the Cartesia-enabled VideoSDK Agents package:
pip install "videosdk-plugins-cartesia"
Importing​
from videosdk.plugins.cartesia import CartesiaSTT
Authentication​
The Cartesia plugin requires a Cartesia API key.
Set CARTESIA_API_KEY
in your .env
file.
Example Usage​
from videosdk.plugins.cartesia import CartesiaSTT
from videosdk.agents import CascadingPipeline
# Initialize the Cartesia STT model
stt = CartesiaSTT(
# When CARTESIA_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-cartesia-api-key",
language="en-US",
model="ink-whisper",
)
# Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)
When using an environment variable for credentials, don't pass the api_key
as an argument to the model instance. The SDK automatically reads the environment variable.
Configuration Options​
api_key
: (str) Your Cartesia API key. Can also be set via theCARTESIA_API_KEY
environment variable.model
: (str) The Cartesia STT model to use (e.g.,"ink-whisper"
). Defaults to"ink-whisper"
.language
: (str) Language code for transcription (default:"en"
).
Additional resources​
The following resources provide more information about using Cartesia with VideoSDK Agents.
-
Python package: The
videosdk-plugins-cartesia
package on PyPI. -
GitHub repo: View the source or contribute to the VideoSDK Cartesia STT plugin.
-
Cartesia docs: Cartesia STT docs.
Got a Question? Ask us on discord