AI Agent with Unity - Quick Start

This guide demonstrates how to integrate a real-time AI agent with a Unity application. The agent, powered by Google's Gemini Live API, acts as a high-energy game show host, guiding the user through a number-guessing game via voice.

Prerequisites

Before you begin, ensure you have the following:

Unity 2022.3 LTS or later.
A VideoSDK Account for generating an auth token. If you don't have one, sign up at the VideoSDK Dashboard.
Python 3.12+ for the AI agent.
A Google API Key for the Gemini Live API.

important

You need a VideoSDK account to generate a token and a Google API key for the Gemini Live API. Visit the VideoSDK dashboard to generate a token and the Google AI Studio for Google API key.

Project Structure

Unity-quickstart/
├── Unity/
│   ├── Assets/
│   │   └── Scripts/
│   │       └── GameManager.cs
│   ├── ProjectSettings/
│   └── Packages/
├── agent-unity.py
└── README.md

You will be working with the following files:

Unity/Assets/Scripts/GameManager.cs: The main script for the Unity application, handling meeting logic and UI.
agent-unity.py: The Python AI agent that joins the meeting.

1. Unity Frontend

Step 1: Install VideoSDK Package

Install VideoSDK Package

Open Unity’s Package Manager by selecting from the top bar: Window -> Package Manager.
Click the + button in the top left corner and select Add package from git URL.
Paste the following URL and click Add:

https://github.com/videosdk-live/videosdk-rtc-unity-sdk.git

Install VideoSDK Package

Add the com.unity.nuget.newtonsoft-json package by following the instructions provided here.

Step 2: Platform Setup

Platform Setup

Android Setup

To integrate the VideoSDK into your Android project, follow these steps:

Add the following repository configuration to your settingsTemplate.gradle file:

settingsTemplate.gradle
dependencyResolutionManagement {
    repositoriesMode.set(RepositoriesMode.PREFER_SETTINGS)
    repositories {
        **ARTIFACTORYREPOSITORY**
        google()
        mavenCentral()
        jcenter()
        maven {
            url = uri("https://maven.aliyun.com/repository/jcenter")
        }
        flatDir {
            dirs "${project(':unityLibrary').projectDir}/libs"
        }
    }
}

Install Android SDK in mainTemplate.gradle

mainTemplate.gradle
dependencies {
    implementation 'live.videosdk:rtc-android-sdk:0.3.1'
}

If your project has set android.useAndroidX=true, then set android.enableJetifier=true in the gradleTemplate.properties file to migrate your project to AndroidX and avoid duplicate class conflict.

gradleTemplate.gradle
android.enableJetifier = true;
android.useAndroidX = true;
android.suppressUnsupportedCompileSdk = 34;

Setting Up for iOS

Build for iOS: In Unity, export the project for iOS.
Open in Xcode: Navigate to the generated Xcode project and open it.
Configure Frameworks:
- Select the Unity-iPhone target.
- Go to the General tab.
- Under Frameworks, Libraries, and Embedded Content, add VideoSDK and its required frameworks.

Unity iPhone Frameworks, Libraries, and Embedded Content

Step 3: Create a Meeting Room

Create a Meeting Room

Create a static roomId using the VideoSDK API. Both the Unity app and the AI agent will use this ID to connect to the same meeting.

curl -X POST https://api.videosdk.live/v2/rooms \
  -H "Authorization: YOUR_JWT_TOKEN_HERE" \
  -H "Content-Type: application/json"

Replace YOUR_JWT_TOKEN_HERE with your VideoSDK auth token. Copy the roomId from the response for the next steps.

Step 4: Configure Unity Project

Configure Unity Project

Update GameManager.cs with your VideoSDK auth token and the roomId you just created.

Unity/Assets/Scripts/GameManager.cs
// ... existing code ...
public class GameManager : MonoBehaviour
{
    // ... existing code ...
    private readonly string _token = "YOUR_VIDEOSDK_AUTH_TOKEN";
    private readonly string _staticMeetingId = "YOUR_MEETING_ID";
    // ... existing code ...
}

Step 5: Set Up Platform-Specific Permissions

Set Up Platform-Specific Permissions

For Android:

Add the following permissions to your AndroidManifest.xml file:

AndroidManifest.xml
<uses-permission android:name="android.permission.CAMERA"/>
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS"/>

For iOS:

Ensure your Info.plist includes descriptions for camera and microphone usage:

Info.plist
<key>NSCameraUsageDescription</key>
<string>Camera access is required for video calls.</string>
<key>NSMicrophoneUsageDescription</key>
<string>Microphone access is required for audio calls.</string>

2. Python AI Agent

Step 1: Configure Environment and Credentials

Configure Environment and Credentials

Create a .env file in the Unity-quickstart directory to store your API keys.

.env
# Google API Key for Gemini Live API
GOOGLE_API_KEY="your_google_api_key_here"

# VideoSDK Authentication Token
VIDEOSDK_AUTH_TOKEN="your_videosdk_auth_token_here"

Step 2: Create the Python AI Agent

Create the Python AI Agent

The Python agent joins the same meeting room to interact with the user. Update agent-unity.py with your roomId.

agent-unity.py
from videosdk.agents import Agent, AgentSession, RealTimePipeline,JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
import logging
logging.getLogger().setLevel(logging.INFO)

class MyVoiceAgent(Agent): 
    def __init__(self):
        super().__init__(
            instructions="You are a high-energy game-show host guiding the caller to guess a secret number from 1 to 100 to win 1,000,000$.",
        )

    async def on_enter(self) -> None:
        await self.session.say("Welcome to the Videosdk's AI Agent game show! I'm your host, and we're about to play for 1,000,000$. Are you ready to play?")
    
    async def on_exit(self) -> None:
        await self.session.say("Goodbye!")

async def start_session(context: JobContext):
    agent = MyVoiceAgent()
    model = GeminiRealtime(
        model="gemini-2.0-flash-live-001",
        config=GeminiLiveConfig(
            voice="Leda",
            response_modalities=["AUDIO"]
        )
    )

    pipeline = RealTimePipeline(model=model)
    session = AgentSession(
        agent=agent,
        pipeline=pipeline
    )

    def on_transcription(data: dict):
        role = data.get("role")
        text = data.get("text")
        print(f"[TRANSCRIPT][{role}]: {text}")
    pipeline.on("realtime_model_transcription", on_transcription)

    await context.run_until_shutdown(session=session,wait_for_participant=True)

def make_context() -> JobContext:
    room_options = RoomOptions(
        room_id="YOUR_MEETING_ID", # Replace with your actual room_id
        name="Gemini Agent",
        playground=True,
    )

    return JobContext(room_options=room_options)

if __name__ == "__main__":
    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
    job.start()

3. Run the Application

Step 1: Run the Python AI Agent

Run the Python AI Agent

Open a terminal, navigate to the Unity-quickstart directory, and run the agent.

# Install Python dependencies
pip install videosdk-agents "videosdk-plugins-google"

# Run the AI agent
python agent-unity.py

Step 2: Run the Unity Application

Run the Unity Application

Open the Unity/ project in Unity Hub.
Once the project is loaded, press the Play button in the Unity Editor to start the application.
Click the Join Meeting button in the app to connect to the session.

Once you join the meeting, the AI agent will greet you and start the game.

Quickstart Example

Complete working example with source code

Got a Question? Ask us on discord

Prerequisites​

Project Structure​

1. Unity Frontend​

Step 1: Install VideoSDK Package​

Step 2: Platform Setup​

Step 3: Create a Meeting Room​

Step 4: Configure Unity Project​

Step 5: Set Up Platform-Specific Permissions​

2. Python AI Agent​

Step 1: Configure Environment and Credentials​

Step 2: Create the Python AI Agent​

3. Run the Application​

Step 1: Run the Python AI Agent​

Step 2: Run the Unity Application​