Skip to main content

Google Gemini (LiveAPI)

The Google Gemini (Live API) provider allows your agent to leverage Google's Gemini models for real-time, multimodal AI interactions.

Installation

Install the Gemini-enabled VideoSDK Agents package:

pip install "videosdk-plugins-google"

Authentication

The Google plugin requires an Gemini API key.

Set GOOGLE_API_KEY in your .env file.

Importing

from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig

Example Usage

from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.agents import RealTimePipeline

# Initialize the Gemini real-time model
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-google-api-key",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)

# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Vision Support

Google Gemini Live can also accept video stream directly from the VideoSDK room. To enable this, simply turn on your camera and set the vision flag to true in the session context. Once that's done, start your agent as usual—no additional changes are required in the pipeline.

pipeline = RealTimePipeline(model=model)

session = AgentSession(
agent=my_agent,
pipeline=pipeline,
context={
"meetingId": "your_actual_meeting_id_here", # Replace with actual meeting ID
"name": "AI Voice Agent",
"videosdk_auth": "your_videosdk_auth_token_here" # Replace with actual token
"vision": True
}
)
  • vision (bool, session context) – when True, forwards Video Stream from VideoSDK's room to Gemini’s LiveAPI (defaults to False).

See it in Action

Explore a complete, end-to-end implementation of an agent using this provider in our AI Agent Quickstart Guide.

Configuration Options

  • model: The Gemini model to use (e.g., "gemini-2.0-flash-live-001"). Other supported models include: "gemini-2.5-flash-preview-native-audio-dialog" and "gemini-2.5-flash-exp-native-audio-thinking-dialog".
  • api_key: Your Google API key (can also be set via environment variable)
  • config: A GeminiLiveConfig object for advanced options:
    • voice: (str or None) The voice to use for audio output (e.g., "Puck").
    • language_code: (str or None) The language code for the conversation (e.g., "en-US").
    • temperature: (float or None) Sampling temperature for response randomness.
    • top_p: (float or None) Nucleus sampling probability.
    • top_k: (float or None) Top-k sampling for response diversity.
    • candidate_count: (int or None) Number of candidate responses to generate.
    • max_output_tokens: (int or None) Maximum number of tokens in the output.
    • presence_penalty: (float or None) Penalty for introducing new topics.
    • frequency_penalty: (float or None) Penalty for repeating tokens.
    • response_modalities: (List[str] or None) List of enabled output modalities (e.g., ["TEXT", "AUDIO"]).
    • output_audio_transcription: (AudioTranscriptionConfig or None) Configuration for audio output transcription.

Additional Resources

The following resources provide more information about using Google with VideoSDK Agents SDK.

Got a Question? Ask us on discord