AI Agent with Unity - Quick Start
This guide demonstrates how to integrate a real-time AI agent with a Unity application. The agent, powered by Google's Gemini Live API, acts as a high-energy game show host, guiding the user through a number-guessing game via voice.
Prerequisites
Before you begin, ensure you have the following:
- Unity 2022.3 LTS or later.
- A VideoSDK Account for generating an auth token. If you don't have one, sign up at the VideoSDK Dashboard.
- Python 3.12+ for the AI agent.
- A Google API Key for the Gemini Live API.
You need a VideoSDK account to generate a token and a Google API key for the Gemini Live API. Visit the VideoSDK dashboard to generate a token and the Google AI Studio for Google API key.
Project Structure
Unity-quickstart/
├── Unity/
│ ├── Assets/
│ │ └── Scripts/
│ │ └── GameManager.cs
│ ├── ProjectSettings/
│ └── Packages/
├── agent-unity.py
└── README.md
You will be working with the following files:
Unity/Assets/Scripts/GameManager.cs
: The main script for the Unity application, handling meeting logic and UI.agent-unity.py
: The Python AI agent that joins the meeting.
1. Unity Frontend
Step 1: Install VideoSDK Package
-
Open Unity’s Package Manager by selecting from the top bar: Window -> Package Manager.
-
Click the + button in the top left corner and select Add package from git URL.
-
Paste the following URL and click Add:
https://github.com/videosdk-live/videosdk-rtc-unity-sdk.git
- Add the
com.unity.nuget.newtonsoft-json
package by following the instructions provided here.
Step 2: Platform Setup
Android Setup
To integrate the VideoSDK into your Android project, follow these steps:
- Add the following repository configuration to your
settingsTemplate.gradle
file:
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.PREFER_SETTINGS)
repositories {
**ARTIFACTORYREPOSITORY**
google()
mavenCentral()
jcenter()
maven {
url = uri("https://maven.aliyun.com/repository/jcenter")
}
flatDir {
dirs "${project(':unityLibrary').projectDir}/libs"
}
}
}
- Install Android SDK in
mainTemplate.gradle
dependencies {
implementation 'live.videosdk:rtc-android-sdk:0.3.1'
}
- If your project has set
android.useAndroidX=true
, then setandroid.enableJetifier=true
in thegradleTemplate.properties
file to migrate your project to AndroidX and avoid duplicate class conflict.
android.enableJetifier = true;
android.useAndroidX = true;
android.suppressUnsupportedCompileSdk = 34;
Setting Up for iOS
- Build for iOS: In Unity, export the project for iOS.
- Open in Xcode: Navigate to the generated Xcode project and open it.
- Configure Frameworks:
- Select the Unity-iPhone target.
- Go to the General tab.
- Under Frameworks, Libraries, and Embedded Content, add VideoSDK and its required frameworks.
Step 3: Create a Meeting Room
Create a static roomId
using the VideoSDK API. Both the Unity app and the AI agent will use this ID to connect to the same meeting.
curl -X POST https://api.videosdk.live/v2/rooms \
-H "Authorization: YOUR_JWT_TOKEN_HERE" \
-H "Content-Type: application/json"
Replace YOUR_JWT_TOKEN_HERE
with your VideoSDK auth token. Copy the roomId
from the response for the next steps.
Step 4: Configure Unity Project
Update GameManager.cs
with your VideoSDK auth token and the roomId
you just created.
// ... existing code ...
public class GameManager : MonoBehaviour
{
// ... existing code ...
private readonly string _token = "YOUR_VIDEOSDK_AUTH_TOKEN";
private readonly string _staticMeetingId = "YOUR_MEETING_ID";
// ... existing code ...
}
Step 5: Set Up Platform-Specific Permissions
For Android:
Add the following permissions to your AndroidManifest.xml
file:
<uses-permission android:name="android.permission.CAMERA"/>
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS"/>
For iOS:
Ensure your Info.plist
includes descriptions for camera and microphone usage:
<key>NSCameraUsageDescription</key>
<string>Camera access is required for video calls.</string>
<key>NSMicrophoneUsageDescription</key>
<string>Microphone access is required for audio calls.</string>
2. Python AI Agent
Step 1: Configure Environment and Credentials
Create a .env
file in the Unity-quickstart
directory to store your API keys.
# Google API Key for Gemini Live API
GOOGLE_API_KEY="your_google_api_key_here"
# VideoSDK Authentication Token
VIDEOSDK_AUTH_TOKEN="your_videosdk_auth_token_here"
Step 2: Create the Python AI Agent
The Python agent joins the same meeting room to interact with the user. Update agent-unity.py
with your roomId
.
from videosdk.agents import Agent, AgentSession, RealTimePipeline,JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
import logging
logging.getLogger().setLevel(logging.INFO)
class MyVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a high-energy game-show host guiding the caller to guess a secret number from 1 to 100 to win 1,000,000$.",
)
async def on_enter(self) -> None:
await self.session.say("Welcome to the Videosdk's AI Agent game show! I'm your host, and we're about to play for 1,000,000$. Are you ready to play?")
async def on_exit(self) -> None:
await self.session.say("Goodbye!")
async def start_session(context: JobContext):
agent = MyVoiceAgent()
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
config=GeminiLiveConfig(
voice="Leda",
response_modalities=["AUDIO"]
)
)
pipeline = RealTimePipeline(model=model)
session = AgentSession(
agent=agent,
pipeline=pipeline
)
def on_transcription(data: dict):
role = data.get("role")
text = data.get("text")
print(f"[TRANSCRIPT][{role}]: {text}")
pipeline.on("realtime_model_transcription", on_transcription)
await context.run_until_shutdown(session=session,wait_for_participant=True)
def make_context() -> JobContext:
room_options = RoomOptions(
room_id="YOUR_MEETING_ID", # Replace with your actual room_id
name="Gemini Agent",
playground=True,
)
return JobContext(room_options=room_options)
if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
3. Run the Application
Step 1: Run the Python AI Agent
Open a terminal, navigate to the Unity-quickstart
directory, and run the agent.
# Install Python dependencies
pip install videosdk-agents "videosdk-plugins-google"
# Run the AI agent
python agent-unity.py
Step 2: Run the Unity Application
- Open the
Unity/
project in Unity Hub. - Once the project is loaded, press the Play button in the Unity Editor to start the application.
- Click the Join Meeting button in the app to connect to the session.
Once you join the meeting, the AI agent will greet you and start the game.
Got a Question? Ask us on discord