Skip to main content

Google STT

The Google STT provider enables your agent to use Google's advanced speech-to-text models for high-accuracy, real-time audio transcription.

Installation​

Install the Google-enabled VideoSDK Agents package:

pip install "videosdk-plugins-google"

Importing​

from videosdk.plugins.google import GoogleSTT

Setup Credentials​

To use Google STT, you need to set up your Google Cloud credentials. You can do this by setting the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of your service account key file.

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"

Alternatively, you can pass the path to the key file directly to the GoogleSTT constructor via the api_key parameter.

Example Usage​

from videosdk.plugins.google import GoogleSTT
from videosdk.agents import CascadingPipeline

# Initialize the Google STT model
stt = GoogleSTT(
# If GOOGLE_APPLICATION_CREDENTIALS is set, you can omit api_key
api_key="/path/to/your/keyfile.json",
languages="en-US",
model="latest_long",
interim_results=True,
punctuate=True
)

# Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)
note

When using an environment variable for credentials, don't pass the api_key as an argument to the model instance. The SDK automatically reads the environment variable.

Configuration Options​

  • api_key: (str) Path to your Google Cloud service account JSON file. This can also be set via the GOOGLE_APPLICATION_CREDENTIALS environment variable.
  • languages: (Union[str, list[str]]) Language code or a list of language codes for transcription (default: "en-US").
  • model: (str) The Google STT model to use (e.g., "latest_long", "telephony") (default: "latest_long").
  • sample_rate: (int) The target audio sample rate in Hz for transcription (default: 16000). The input audio at 48000Hz will be resampled to this rate.
  • interim_results: (bool) Enable real-time partial transcription results (default: True).
  • punctuate: (bool) Add punctuation to transcription (default: True).
  • min_confidence_threshold: (float) The minimum confidence level for a transcription result to be considered valid (default: 0.1).
  • location: (str) The Google Cloud location to use for the STT service (default: "global").

Got a Question? Ask us on discord