Google STT
The Google STT provider enables your agent to use Google's advanced speech-to-text models for high-accuracy, real-time audio transcription.
Installation​
Install the Google-enabled VideoSDK Agents package:
pip install "videosdk-plugins-google"
Importing​
from videosdk.plugins.google import GoogleSTT
Setup Credentials​
To use Google STT, you need to set up your Google Cloud credentials. You can do this by setting the GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path of your service account key file.
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"
Alternatively, you can pass the path to the key file directly to the GoogleSTT
constructor via the api_key
parameter.
Example Usage​
from videosdk.plugins.google import GoogleSTT
from videosdk.agents import CascadingPipeline
# Initialize the Google STT model
stt = GoogleSTT(
# If GOOGLE_APPLICATION_CREDENTIALS is set, you can omit api_key
api_key="/path/to/your/keyfile.json",
languages="en-US",
model="latest_long",
interim_results=True,
punctuate=True
)
# Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)
note
When using an environment variable for credentials, don't pass the api_key
as an argument to the model instance. The SDK automatically reads the environment variable.
Configuration Options​
api_key
: (str) Path to your Google Cloud service account JSON file. This can also be set via theGOOGLE_APPLICATION_CREDENTIALS
environment variable.languages
: (Union[str, list[str]]) Language code or a list of language codes for transcription (default:"en-US"
).model
: (str) The Google STT model to use (e.g.,"latest_long"
,"telephony"
) (default:"latest_long"
).sample_rate
: (int) The target audio sample rate in Hz for transcription (default:16000
). The input audio at 48000Hz will be resampled to this rate.interim_results
: (bool) Enable real-time partial transcription results (default:True
).punctuate
: (bool) Add punctuation to transcription (default:True
).min_confidence_threshold
: (float) The minimum confidence level for a transcription result to be considered valid (default:0.1
).location
: (str) The Google Cloud location to use for the STT service (default:"global"
).
Got a Question? Ask us on discord