Skip to main content

Google TTS

The Google TTS plugin enables your agent to use Google's text-to-speech models for generating natural-sounding voice output. It supports low-latency gRPC streaming with Chirp 3 HD voices and Vertex AI endpoints.

Installation

pip install "videosdk-plugins-google"

Authentication

Set your Google API key as an environment variable:

export GOOGLE_API_KEY="your-google-api-key"

You can obtain an API key from the Google AI Studio.

Example Usage

from videosdk.plugins.google import GoogleTTS, GoogleVoiceConfig
from videosdk.agents import CascadingPipeline

# Configure voice settings
voice_config = GoogleVoiceConfig(
languageCode="en-US",
name="en-US-Chirp3-HD-Aoede",
ssmlGender="FEMALE"
)

# Initialize the Google TTS model
tts = GoogleTTS(
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-google-api-key",
speed=1.0,
pitch=0.0,
voice_config=voice_config,
custom_pronunciations=[{"tomato": "təˈmeɪtoʊ"}], # Optional IPA overrides
)

# Add tts to cascading pipeline
pipeline = CascadingPipeline(tts=tts)

Vertex AI

To use the Vertex AI endpoint instead of an API key, authenticate using Application Default Credentials (ADC) and set your project ID:

export GOOGLE_CLOUD_PROJECT="my-gcp-project"
from videosdk.plugins.google import GoogleTTS, VertexAIConfig

tts = GoogleTTS(
vertexai=True,
vertexai_config=VertexAIConfig(location="us-central1"),
streaming=False, # Streaming cannot be used with Vertex AI
)
note
  • streaming=True (the default) requires a Chirp 3 HD voice (e.g. en-US-Chirp3-HD-Aoede) and cannot be combined with vertexai=True.
  • Vertex AI requires a GCP project ID via VertexAIConfig(project_id="..."), the GOOGLE_CLOUD_PROJECT env variable, or a GOOGLE_APPLICATION_CREDENTIALS service-account file.

Configuration Options

  • api_key: (str) Your Google Cloud TTS API key. Can also be set via the GOOGLE_API_KEY environment variable.
  • speed: (float) The speaking rate of the generated audio (default: 1.0).
  • pitch: (float) The pitch of the generated audio. Can be between -20.0 and 20.0 (default: 0.0).
  • response_format: (str) The format of the audio response. Currently only supports "pcm" (default: "pcm").
  • voice_config: (GoogleVoiceConfig) Configuration for the voice to be used.
    • languageCode: (str) The language code of the voice (e.g., "en-US", "en-GB") (default: "en-US").
    • name: (str) The name of the voice to use (e.g., "en-US-Chirp3-HD-Aoede", "en-US-News-N") (default: "en-US-Chirp3-HD-Aoede").
    • ssmlGender: (str) The gender of the voice ("MALE", "FEMALE", "NEUTRAL") (default: "FEMALE").
  • custom_pronunciations: (list[dict] | dict | None) IPA pronunciation overrides for specific words (e.g., [{"tomato": "təˈmeɪtoʊ"}]). Defaults to None.
  • streaming: (bool) Use gRPC StreamingSynthesize for lower-latency audio generation. Only compatible with Chirp 3 HD voices and cannot be combined with vertexai=True (default: True).
  • vertexai: (bool) Use the Vertex AI TTS endpoint with Application Default Credentials (ADC) instead of an API key (default: False).
  • vertexai_config: (VertexAIConfig) Project and region settings for Vertex AI.
    • project_id: (str | None) Your GCP project ID. Falls back to GOOGLE_CLOUD_PROJECT or GOOGLE_APPLICATION_CREDENTIALS (default: None).
    • location: (str) GCP region for the TTS endpoint (default: "us-central1").

Additional Resources

The following resources provide more information about using Google with VideoSDK Agents SDK.

Got a Question? Ask us on discord