Skip to main content

Google Gemini (LiveAPI)

The Google Gemini (Live API) provider allows your agent to leverage Google's Gemini models for real-time, multimodal AI interactions.

Installation​

Install the Gemini-enabled VideoSDK Agents package:

pip install "videosdk-plugins-google"

Importing​

from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig

Example Usage​

from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.agents import RealTimePipeline

# Initialize the Gemini real-time model
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-google-api-key",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)

# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Configuration Options​

  • model: The Gemini model to use (e.g., "gemini-2.0-flash-live-001"). Other supported models include: "gemini-2.5-flash-preview-native-audio-dialog" and "gemini-2.5-flash-exp-native-audio-thinking-dialog".
  • api_key: Your Google API key (can also be set via environment variable)
  • config: A GeminiLiveConfig object for advanced options:
    • voice: (str or None) The voice to use for audio output (e.g., "Puck").
    • language_code: (str or None) The language code for the conversation (e.g., "en-US").
    • temperature: (float or None) Sampling temperature for response randomness.
    • top_p: (float or None) Nucleus sampling probability.
    • top_k: (float or None) Top-k sampling for response diversity.
    • candidate_count: (int or None) Number of candidate responses to generate.
    • max_output_tokens: (int or None) Maximum number of tokens in the output.
    • presence_penalty: (float or None) Penalty for introducing new topics.
    • frequency_penalty: (float or None) Penalty for repeating tokens.
    • response_modalities: (List[str] or None) List of enabled output modalities (e.g., ["TEXT", "AUDIO"]).
    • output_audio_transcription: (AudioTranscriptionConfig or None) Configuration for audio output transcription.
tip

Explore and utilize ready-made scripts for Gemini(LiveAPI) with the VideoSDK AI Agent SDK. Gemini(LiveAPI) Example Script.

Got a Question? Ask us on discord