Google STT

The Google STT provider enables your agent to use Google's advanced speech-to-text models for high-accuracy, real-time audio transcription.

Installation

Install the Google-enabled VideoSDK Agents package:

pip install "videosdk-plugins-google"

Importing

from videosdk.plugins.google import GoogleSTT

Setup Credentials

To use Google STT, you need to set up your Google Cloud credentials. You can do this by setting the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of your service account key file.

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"

Alternatively, you can pass the path to the key file directly to the GoogleSTT constructor via the api_key parameter.

Example Usage

from videosdk.plugins.google import GoogleSTT
from videosdk.agents import CascadingPipeline

# Initialize the Google STT model
stt = GoogleSTT(
    # If GOOGLE_APPLICATION_CREDENTIALS is set, you can omit api_key
    api_key="/path/to/your/keyfile.json",
    languages="en-US",
    model="latest_long",
    interim_results=True,
    punctuate=True
)

#  Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)

note

When using an environment variable for credentials, don't pass the api_key as an argument to the model instance. The SDK automatically reads the environment variable.

Configuration Options

api_key: (str) Path to your Google Cloud service account JSON file. This can also be set via the GOOGLE_APPLICATION_CREDENTIALS environment variable.
languages: (Union[str, list[str]]) Language code or a list of language codes for transcription (default: "en-US").
model: (str) The Google STT model to use (e.g., "latest_long", "telephony") (default: "latest_long").
sample_rate: (int) The target audio sample rate in Hz for transcription (default: 16000). The input audio at 48000Hz will be resampled to this rate.
interim_results: (bool) Enable real-time partial transcription results (default: True).
punctuate: (bool) Add punctuation to transcription (default: True).
min_confidence_threshold: (float) The minimum confidence level for a transcription result to be considered valid (default: 0.1).
location: (str) The Google Cloud location to use for the STT service (default: "global").

Got a Question? Ask us on discord

Installation​

Importing​

Setup Credentials​

Example Usage​

Configuration Options​

Installation

Importing

Setup Credentials

Example Usage

Configuration Options