Gladia STT
The Gladia STT provider enables your agent to use Gladia's fast and accurate speech-to-text models for real-time audio transcription with support for multiple languages and code-switching.
Installation
Install the Gladia-enabled VideoSDK Agents package:
pip install "videosdk-plugins-gladia"
Authentication
The Gladia plugin requires a Gladia API key.
Set GLADIA_API_KEY in your .env file.
Importing
from videosdk.plugins.gladia import GladiaSTT
Example Usage
from videosdk.plugins.gladia import GladiaSTT
from videosdk.agents import CascadingPipeline
# Initialize the Gladia STT model
stt = GladiaSTT(
# When GLADIA_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-gladia-api-key",
languages=["en"],
code_switching=True,
receive_partial_transcripts=True
)
# Add stt to a cascading pipeline
pipeline = CascadingPipeline(stt=stt)
note
When using a .env file for credentials, you do not need to pass the api_key as an argument to the model instance; the SDK reads it automatically.
Configuration Options
api_key: (str, optional) Your Gladia API key. Can also be set via theGLADIA_API_KEYenvironment variable.model: (str, optional) The model to use. Defaults to"solaria-1".languages: (List[str], optional) A list of language codes to detect (e.g.,["en", "fr"]). Defaults to["en"].code_switching: (bool, optional) Enables automatic language switching between the provided languages. Defaults toTrue.input_sample_rate: (int, optional) The sample rate of the incoming audio. Defaults to48000.output_sample_rate: (int, optional) The sample rate Gladia should process. Defaults to16000.encoding: (str, optional) The audio encoding format. Defaults to"wav/pcm".bit_depth: (int, optional) The bit depth of the audio. Defaults to16.channels: (int, optional) The number of audio channels. Defaults to1(mono).receive_partial_transcripts: (bool, optional) Set toTrueto receive interim transcription results for lower latency. Defaults toFalse.
Got a Question? Ask us on discord

