Google Gemini (LiveAPI)

The Google Gemini (Live API) provider allows your agent to leverage Google's Gemini models for real-time, multimodal AI interactions.

Installation

Install the Gemini-enabled VideoSDK Agents package:

pip install "videosdk-plugins-google"

Importing

from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig

Example Usage

from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.agents import RealTimePipeline

# Initialize the Gemini real-time model
model = GeminiRealtime(
    model="gemini-2.0-flash-live-001",
    # When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
    api_key="your-google-api-key", 
    config=GeminiLiveConfig(           
        voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
        response_modalities=["AUDIO"]
    )
)

# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)

note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Configuration Options

model: The Gemini model to use (e.g., "gemini-2.0-flash-live-001"). Other supported models include: "gemini-2.5-flash-preview-native-audio-dialog" and "gemini-2.5-flash-exp-native-audio-thinking-dialog".
api_key: Your Google API key (can also be set via environment variable)
config: A GeminiLiveConfig object for advanced options:
- voice: (str or None) The voice to use for audio output (e.g., "Puck").
- language_code: (str or None) The language code for the conversation (e.g., "en-US").
- temperature: (float or None) Sampling temperature for response randomness.
- top_p: (float or None) Nucleus sampling probability.
- top_k: (float or None) Top-k sampling for response diversity.
- candidate_count: (int or None) Number of candidate responses to generate.
- max_output_tokens: (int or None) Maximum number of tokens in the output.
- presence_penalty: (float or None) Penalty for introducing new topics.
- frequency_penalty: (float or None) Penalty for repeating tokens.
- response_modalities: (List[str] or None) List of enabled output modalities (e.g., ["TEXT", "AUDIO"]).
- output_audio_transcription: (AudioTranscriptionConfig or None) Configuration for audio output transcription.

tip

Explore and utilize ready-made scripts for Gemini(LiveAPI) with the VideoSDK AI Agent SDK. Gemini(LiveAPI) Example Script.

Got a Question? Ask us on discord

Installation​

Importing​

Example Usage​

Configuration Options​

Installation

Importing

Example Usage

Configuration Options