Skip to main content
Version: 1.0.x

Google Gemini (LiveAPI)

The Google Gemini (Live API) provider allows your agent to leverage Google's Gemini models for real-time, multimodal AI interactions.

Installation

Install the Gemini-enabled VideoSDK Agents package:

pip install "videosdk-plugins-google"

Authentication

The Google plugin requires an Gemini API key.

Set GOOGLE_API_KEY in your .env file.

Importing

from videosdk.agents.plugins import GeminiRealtime, GeminiLiveConfig

Example Usage

from videosdk.agents.plugins import GeminiRealtime, GeminiLiveConfig
from videosdk.agents import Pipeline

# Initialize the Gemini real-time model
model = GeminiRealtime(
model="gemini-3.1-flash-live-preview",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-google-api-key",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)

# Create the pipeline with the model
pipeline = Pipeline(llm=model)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Vertex AI Integration

You can also use Google's Gemini models through Vertex AI. This requires a different authentication and configuration setup.

Authentication for Vertex AI

For Vertex AI, you need to set up Google Cloud credentials. Create a service account, download the JSON key file, and set the path to this file in your environment.

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"

You should also configure your project ID and location. These can be set as environment variables or directly in the code. If not set, the project_id is inferred from the credentials file and the location defaults to us-central1.

export GOOGLE_CLOUD_PROJECT="your-gcp-project-id"
export GOOGLE_CLOUD_LOCATION="your-gcp-location"

Example Usage with Vertex AI

To use Vertex AI, set vertexai=True when initializing GeminiRealtime. You can configure the project and location using VertexAIConfig, which will take precedence over environment variables.

from videosdk.agents.plugins import GeminiRealtime, VertexAIConfig
from videosdk.agents import Pipeline

# Initialize the Gemini real-time model with Vertex AI configuration
model = GeminiRealtime(
model="gemini-live-2.5-flash-native-audio",
vertexai=True,
vertexai_config=VertexAIConfig(
project_id="videosdk",
location="us-central1"
)
)

# Create the pipeline with the model
pipeline = Pipeline(llm=model)

Vision Support

Google Gemini Live can also accept video stream directly from the VideoSDK room. To enable this, simply turn on your camera and set the vision flag to true in the session context. Once that's done, start your agent as usual-no additional changes are required in the pipeline.

pipeline = Pipeline(llm=model)

session = AgentSession(
agent=my_agent,
pipeline=pipeline,
)

job_context = JobContext(
room_options = RoomOptions(
room_id = "YOUR_ROOM_ID",
name = "Agent",
vision = True
)
)
  • vision (bool, room options) – when True, forwards Video Stream from VideoSDK's room to Gemini’s LiveAPI (defaults to False).

Session Resumption

Gemini Live connections have a limited lifetime (~10 minutes). By default, the SDK resumes the session when the server disconnects - conversation context is preserved and no manual reconnection is needed.

# Default: resumption enabled
config = GeminiLiveConfig(voice="Puck")

# Disable resumption
config = GeminiLiveConfig(
voice="Puck",
session_resumption=None,
)
session_resumptionBehavior
Default (SessionResumptionConfig(handle=None))Resumption enabled; SDK stores and reuses the session handle automatically
NoneResumption disabled

See it in Action

Explore a complete, end-to-end implementation of an agent using this provider in our AI Agent Quickstart Guide.

Configuration Options

  • model: The Gemini model to use (e.g., "gemini-3.1-flash-live-preview"). Other supported models include: "gemini-2.5-flash-preview-native-audio-dialog" and "gemini-2.5-flash-exp-native-audio-thinking-dialog".
  • api_key: Your Google API key (can also be set via the GOOGLE_API_KEY environment variable)
  • service_account_path: (str or None) Path to a Google service account JSON file (alternative to api_key).
  • vertexai: (bool) Whether to use the Vertex AI backend instead of the Gemini API (default: False).
  • vertexai_config: (VertexAIConfig or None) Configuration for Vertex AI (e.g., project_id, location).
  • config: A GeminiLiveConfig object for advanced options:
    • voice: (str or None) The voice to use for audio output (e.g., "Puck", "Charon", "Kore", "Fenrir", "Aoede") (default: "Puck").
    • language_code: (str or None) The language code for the conversation (e.g., "en-US") (default: "en-US").
    • temperature: (float or None) Sampling temperature for response randomness (default: None).
    • top_p: (float or None) Nucleus sampling probability (default: None).
    • top_k: (float or None) Top-k sampling for response diversity (default: None).
    • candidate_count: (int or None) Number of candidate responses to generate (default: 1).
    • max_output_tokens: (int or None) Maximum number of tokens in the output (default: None).
    • presence_penalty: (float or None) Penalty for introducing new topics. Range -2.0 to 2.0 (default: None).
    • frequency_penalty: (float or None) Penalty for repeating tokens. Range -2.0 to 2.0 (default: None).
    • response_modalities: (List[str] or None) List of enabled output modalities (e.g., ["TEXT"]or ["AUDIO"](one at a time)) (default: ["AUDIO"]).
    • input_audio_transcription: (AudioTranscriptionConfig or None) Configuration for audio input transcription.
    • output_audio_transcription: (AudioTranscriptionConfig or None) Configuration for audio output transcription.
    • thinking_config: (ThinkingConfig or None) Configuration for the model's "thinking" behavior. Defaults to thinking disabled (thinking_budget=0) for low-latency voice; only applied to native-audio models. Pass None to let the model decide its own thinking budget.
    • realtime_input_config: (RealtimeInputConfig or None) Configuration for realtime input handling. Defaults to aggressive low-latency VAD; pass None to use Gemini server defaults.
    • context_window_compression: (ContextWindowCompressionConfig or None) Configuration for context window compression. Defaults to sliding-window compression (extends sessions past the 15-min cap); pass None to disable.
    • session_resumption: (SessionResumptionConfig or None) Session resumption on server disconnect. Enabled by default; set to None to disable.

Additional Resources

The following resources provide more information about using Google with VideoSDK Agents SDK.

Got a Question? Ask us on discord