Google Gemini (LiveAPI)
The Google Gemini (Live API) provider allows your agent to leverage Google's Gemini models for real-time, multimodal AI interactions.
Installation​
Install the Gemini-enabled VideoSDK Agents package:
pip install "videosdk-plugins-google"
Importing​
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
Example Usage​
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.agents import RealTimePipeline
# Initialize the Gemini real-time model
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-google-api-key",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)
# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
note
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.
Configuration Options​
model
: The Gemini model to use (e.g.,"gemini-2.0-flash-live-001"
). Other supported models include:"gemini-2.5-flash-preview-native-audio-dialog"
and"gemini-2.5-flash-exp-native-audio-thinking-dialog"
.api_key
: Your Google API key (can also be set via environment variable)config
: AGeminiLiveConfig
object for advanced options:voice
: (str or None) The voice to use for audio output (e.g.,"Puck"
).language_code
: (str or None) The language code for the conversation (e.g.,"en-US"
).temperature
: (float or None) Sampling temperature for response randomness.top_p
: (float or None) Nucleus sampling probability.top_k
: (float or None) Top-k sampling for response diversity.candidate_count
: (int or None) Number of candidate responses to generate.max_output_tokens
: (int or None) Maximum number of tokens in the output.presence_penalty
: (float or None) Penalty for introducing new topics.frequency_penalty
: (float or None) Penalty for repeating tokens.response_modalities
: (List[str] or None) List of enabled output modalities (e.g.,["TEXT", "AUDIO"]
).output_audio_transcription
: (AudioTranscriptionConfig
or None) Configuration for audio output transcription.
tip
Explore and utilize ready-made scripts for Gemini(LiveAPI) with the VideoSDK AI Agent SDK. Gemini(LiveAPI) Example Script.
Got a Question? Ask us on discord