Sarvam AI STT
The Sarvam AI STT provider enables your agent to use Sarvam AI's speech-to-text models for transcription. This provider uses Voice Activity Detection (VAD) to send audio chunks for transcription after a period of silence.
Installation​
Install the Sarvam AI-enabled VideoSDK Agents package:
pip install "videosdk-plugins-sarvamai"
Importing​
from videosdk.plugins.sarvamai import SarvamAISTT
Example Usage​
from videosdk.plugins.sarvamai import SarvamAISTT
from videosdk.agents import CascadingPipeline
# Initialize the Sarvam AI STT model
stt = SarvamAISTT(
# When SARVAMAI_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-sarvam-ai-api-key",
model="saarika:v2",
language="en-IN"
)
# Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)
note
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key
and other credential parameters from your code.
Configuration Options​
api_key
: (str) Your Sarvam AI API key. Can also be set via theSARVAMAI_API_KEY
environment variable.model
: (str) The Sarvam AI model to use (default:"saarika:v2"
).language
: (str) Language code for transcription (default:"en-IN"
).input_sample_rate
: (int) The sample rate of the audio from the source in Hz (default:48000
).output_sample_rate
: (int) The sample rate to which the audio is resampled before sending for transcription (default:16000
).silence_threshold
: (float) The normalized amplitude threshold for silence detection (default:0.01
).silence_duration
: (float) The duration of silence in seconds that triggers the end of a speech segment for transcription (default:0.8
).
Got a Question? Ask us on discord