Google Gemini (LiveAPI)
The Google Gemini (Live API) provider allows your agent to leverage Google's Gemini models for real-time, multimodal AI interactions.
Installation
Install the Gemini-enabled VideoSDK Agents package:
pip install "videosdk-plugins-google"
Authentication
The Google plugin requires an Gemini API key.
Set GOOGLE_API_KEY in your .env file.
Importing
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
Example Usage
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.agents import RealTimePipeline
# Initialize the Gemini real-time model
model = GeminiRealtime(
model="gemini-2.5-flash-native-audio-preview-12-2025",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-google-api-key",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)
# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.
Vertex AI Integration
You can also use Google's Gemini models through Vertex AI. This requires a different authentication and configuration setup.
Authentication for Vertex AI
For Vertex AI, you need to set up Google Cloud credentials. Create a service account, download the JSON key file, and set the path to this file in your environment.
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"
You should also configure your project ID and location. These can be set as environment variables or directly in the code. If not set, the project_id is inferred from the credentials file and the location defaults to us-central1.
export GOOGLE_CLOUD_PROJECT="your-gcp-project-id"
export GOOGLE_CLOUD_LOCATION="your-gcp-location"
Example Usage with Vertex AI
To use Vertex AI, set vertexai=True when initializing GeminiRealtime. You can configure the project and location using VertexAIConfig, which will take precedence over environment variables.
from videosdk.plugins.google import GeminiRealtime, VertexAIConfig
from videosdk.agents import RealTimePipeline
# Initialize the Gemini real-time model with Vertex AI configuration
model = GeminiRealtime(
model="gemini-live-2.5-flash-native-audio",
vertexai=True,
vertexai_config=VertexAIConfig(
project_id="videosdk",
location="us-central1"
)
)
# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
Vision Support
Google Gemini Live can also accept video stream directly from the VideoSDK room. To enable this, simply turn on your camera and set the vision flag to true in the session context. Once that's done, start your agent as usual—no additional changes are required in the pipeline.
pipeline = RealTimePipeline(model=model)
session = AgentSession(
agent=my_agent,
pipeline=pipeline,
)
job_context = JobContext(
room_options = RoomOptions(
room_id = "YOUR_ROOM_ID",
name = "Agent",
vision = True
)
)
vision(bool, room options) – whenTrue, forwards Video Stream from VideoSDK's room to Gemini’s LiveAPI (defaults toFalse).
See it in Action
Explore a complete, end-to-end implementation of an agent using this provider in our AI Agent Quickstart Guide.
Configuration Options
model: The Gemini model to use (e.g.,"gemini-2.5-flash-native-audio-preview-12-2025"). Other supported models include:"gemini-2.5-flash-preview-native-audio-dialog"and"gemini-2.5-flash-exp-native-audio-thinking-dialog".api_key: Your Google API key (can also be set via environment variable)config: AGeminiLiveConfigobject for advanced options:voice: (str or None) The voice to use for audio output (e.g.,"Puck").language_code: (str or None) The language code for the conversation (e.g.,"en-US").temperature: (float or None) Sampling temperature for response randomness.top_p: (float or None) Nucleus sampling probability.top_k: (float or None) Top-k sampling for response diversity.candidate_count: (int or None) Number of candidate responses to generate.max_output_tokens: (int or None) Maximum number of tokens in the output.presence_penalty: (float or None) Penalty for introducing new topics.frequency_penalty: (float or None) Penalty for repeating tokens.response_modalities: (List[str] or None) List of enabled output modalities (e.g.,["TEXT"]or["AUDIO"](one at a time)).output_audio_transcription: (AudioTranscriptionConfigor None) Configuration for audio output transcription.
Additional Resources
The following resources provide more information about using Google with VideoSDK Agents SDK.
-
Plugin quickstart: Quickstart for the Gemini Realtime API plugin.
-
Gemini docs: Gemini Live API documentation.
Got a Question? Ask us on discord

