Skip to main content

Google Gemini (LiveAPI)

The Google Gemini (Live API) provider allows your agent to leverage Google's Gemini models for real-time, multimodal AI interactions.

Installation

Install the Gemini-enabled VideoSDK Agents package:

pip install "videosdk-plugins-google"

Authentication

The Google plugin requires an Gemini API key.

Set GOOGLE_API_KEY in your .env file.

Importing

from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig

Example Usage

from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.agents import RealTimePipeline

# Initialize the Gemini real-time model
model = GeminiRealtime(
model="gemini-2.5-flash-native-audio-preview-12-2025",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-google-api-key",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)

# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Vertex AI Integration

You can also use Google's Gemini models through Vertex AI. This requires a different authentication and configuration setup.

Authentication for Vertex AI

For Vertex AI, you need to set up Google Cloud credentials. Create a service account, download the JSON key file, and set the path to this file in your environment.

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"

You should also configure your project ID and location. These can be set as environment variables or directly in the code. If not set, the project_id is inferred from the credentials file and the location defaults to us-central1.

export GOOGLE_CLOUD_PROJECT="your-gcp-project-id"
export GOOGLE_CLOUD_LOCATION="your-gcp-location"

Example Usage with Vertex AI

To use Vertex AI, set vertexai=True when initializing GeminiRealtime. You can configure the project and location using VertexAIConfig, which will take precedence over environment variables.

from videosdk.plugins.google import GeminiRealtime, VertexAIConfig
from videosdk.agents import RealTimePipeline

# Initialize the Gemini real-time model with Vertex AI configuration
model = GeminiRealtime(
model="gemini-live-2.5-flash-native-audio",
vertexai=True,
vertexai_config=VertexAIConfig(
project_id="videosdk",
location="us-central1"
)
)

# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)

Vision Support

Google Gemini Live can also accept video stream directly from the VideoSDK room. To enable this, simply turn on your camera and set the vision flag to true in the session context. Once that's done, start your agent as usual—no additional changes are required in the pipeline.

pipeline = RealTimePipeline(model=model)

session = AgentSession(
agent=my_agent,
pipeline=pipeline,
)

job_context = JobContext(
room_options = RoomOptions(
room_id = "YOUR_ROOM_ID",
name = "Agent",
vision = True
)
)
  • vision (bool, room options) – when True, forwards Video Stream from VideoSDK's room to Gemini’s LiveAPI (defaults to False).

See it in Action

Explore a complete, end-to-end implementation of an agent using this provider in our AI Agent Quickstart Guide.

Configuration Options

  • model: The Gemini model to use (e.g., "gemini-2.5-flash-native-audio-preview-12-2025"). Other supported models include: "gemini-2.5-flash-preview-native-audio-dialog" and "gemini-2.5-flash-exp-native-audio-thinking-dialog".
  • api_key: Your Google API key (can also be set via environment variable)
  • config: A GeminiLiveConfig object for advanced options:
    • voice: (str or None) The voice to use for audio output (e.g., "Puck").
    • language_code: (str or None) The language code for the conversation (e.g., "en-US").
    • temperature: (float or None) Sampling temperature for response randomness.
    • top_p: (float or None) Nucleus sampling probability.
    • top_k: (float or None) Top-k sampling for response diversity.
    • candidate_count: (int or None) Number of candidate responses to generate.
    • max_output_tokens: (int or None) Maximum number of tokens in the output.
    • presence_penalty: (float or None) Penalty for introducing new topics.
    • frequency_penalty: (float or None) Penalty for repeating tokens.
    • response_modalities: (List[str] or None) List of enabled output modalities (e.g., ["TEXT"]or ["AUDIO"](one at a time)).
    • output_audio_transcription: (AudioTranscriptionConfig or None) Configuration for audio output transcription.

Additional Resources

The following resources provide more information about using Google with VideoSDK Agents SDK.

Got a Question? Ask us on discord