Google Gemini (LiveAPI)
The Google Gemini (Live API) provider allows your agent to leverage Google's Gemini models for real-time, multimodal AI interactions.
Installation
Install the Gemini-enabled VideoSDK Agents package:
pip install "videosdk-plugins-google"
Authentication
The Google plugin requires an Gemini API key.
Set GOOGLE_API_KEY
in your .env
file.
Importing
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
Example Usage
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.agents import RealTimePipeline
# Initialize the Gemini real-time model
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-google-api-key",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)
# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.
Vision Support
Google Gemini Live can also accept video stream
directly from the VideoSDK room. To enable this, simply turn on your camera and set the vision flag to true in the session context. Once that's done, start your agent as usual—no additional changes are required in the pipeline.
pipeline = RealTimePipeline(model=model)
session = AgentSession(
agent=my_agent,
pipeline=pipeline,
context={
"meetingId": "your_actual_meeting_id_here", # Replace with actual meeting ID
"name": "AI Voice Agent",
"videosdk_auth": "your_videosdk_auth_token_here" # Replace with actual token
"vision": True
}
)
vision
(bool, session context) – whenTrue
, forwards Video Stream from VideoSDK's room to Gemini’s LiveAPI (defaults toFalse
).
See it in Action
Explore a complete, end-to-end implementation of an agent using this provider in our AI Agent Quickstart Guide.
Configuration Options
model
: The Gemini model to use (e.g.,"gemini-2.0-flash-live-001"
). Other supported models include:"gemini-2.5-flash-preview-native-audio-dialog"
and"gemini-2.5-flash-exp-native-audio-thinking-dialog"
.api_key
: Your Google API key (can also be set via environment variable)config
: AGeminiLiveConfig
object for advanced options:voice
: (str or None) The voice to use for audio output (e.g.,"Puck"
).language_code
: (str or None) The language code for the conversation (e.g.,"en-US"
).temperature
: (float or None) Sampling temperature for response randomness.top_p
: (float or None) Nucleus sampling probability.top_k
: (float or None) Top-k sampling for response diversity.candidate_count
: (int or None) Number of candidate responses to generate.max_output_tokens
: (int or None) Maximum number of tokens in the output.presence_penalty
: (float or None) Penalty for introducing new topics.frequency_penalty
: (float or None) Penalty for repeating tokens.response_modalities
: (List[str] or None) List of enabled output modalities (e.g.,["TEXT", "AUDIO"]
).output_audio_transcription
: (AudioTranscriptionConfig
or None) Configuration for audio output transcription.
Additional Resources
The following resources provide more information about using Google with VideoSDK Agents SDK.
-
Python package: The
videosdk-plugins-google
package on PyPI. -
Plugin quickstart: Quickstart for the Gemini Realtime API plugin.
-
GitHub repo: View the source or contribute to the VideoSDK Gemini realtime plugin.
-
Gemini docs: Gemini Live API documentation.
Got a Question? Ask us on discord