AI Agent with React Native - Quick Start
VideoSDK empowers you to integrate an AI voice agent into your React Native app (Android/iOS) within minutes. The agent joins the same meeting room and interacts over voice using the Google Gemini Live API.
Prerequisites
- VideoSDK Developer Account (get token from the dashboard)
- Node.js and a working React Native environment (Android Studio and/or Xcode)
- Python 3.12+
- Google API Key with Gemini Live API access
You need a VideoSDK account to generate a token and a Google API key for the Gemini Live API. Visit the VideoSDK dashboard to generate a token and the Google AI Studio for Google API key.
Project Structure
First, create an empty project using mkdir folder_name
on your preferable location for the React Native Frontend. Your final project structure should look like this:
root
├── android/
├── ios/
├── App.js
├── constants.js
├── index.js
├── agent-react-native.py
└── .env
You will work on:
android/
: Contains the Android-specific project files.ios/
: Contains the iOS-specific project files.App.js
: The main React Native component, containing the UI and meeting logic.constants.js
: To store token and meetingId for the frontend.index.js
: The entry point of the React Native application, where VideoSDK is registered.agent-react-native.py
: The Python agent that joins the meeting..env
: Environment variables file for the Python agent (stores API keys).
1. Building the React Native Frontend
Step 1: Create App and Install SDKs
Create a React Native app and install the VideoSDK RN SDK:
npx react-native init videosdkAiAgentRN
cd videosdkAiAgentRN
# Install VideoSDK
npm install "@videosdk.live/react-native-sdk"
Step 2: Configure the Project
Android Setup
<manifest
xmlns:android="http://schemas.android.com/apk/res/android"
>
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission
android:name="android.permission.BLUETOOTH"
android:maxSdkVersion="30" />
<uses-permission
android:name="android.permission.BLUETOOTH_ADMIN"
android:maxSdkVersion="30" />
<uses-permission android:name="android.permission.BLUETOOTH_CONNECT" />
<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.WAKE_LOCK" />
</manifest>
dependencies {
implementation project(':rnwebrtc')
}
include ':rnwebrtc'
project(':rnwebrtc').projectDir = new File(rootProject.projectDir, '../node_modules/@videosdk.live/react-native-webrtc/android')
import live.videosdk.rnwebrtc.WebRTCModulePackage
class MainApplication : Application(), ReactApplication {
override val reactNativeHost: ReactNativeHost =
object : DefaultReactNativeHost(this) {
override fun getPackages(): List<ReactPackage> {
val packages = PackageList(this).packages.toMutableList()
packages.add(WebRTCModulePackage())
return packages
}
// ...
}
}
/* This one fixes a weird WebRTC runtime problem on some devices. */
android.enableDexingArtifactTransform.desugaring=false
-keep class org.webrtc.** { *; }
buildscript {
ext {
minSdkVersion = 23
}
}
iOS Setup
To update CocoaPods, you can reinstall the gem using the following command:
$ sudo gem install cocoapods
pod ‘react-native-webrtc’, :path => ‘../node_modules/@videosdk.live/react-native-webrtc’
You need to change the platform field in the Podfile to 12.0 or above because react-native-webrtc doesn't support iOS versions earlier than 12.0. Update the line: platform : ios, ‘12.0’.
After updating the version, you need to install the pods by running the following command:
pod install
Add the following lines to your info.plist file located at (project folder/ios/projectname/info.plist):
<key>NSCameraUsageDescription</key>
<string>Camera permission description</string>
<key>NSMicrophoneUsageDescription</key>
<string>Microphone permission description</string>
Step 3: Register Service and Configure
Register VideoSDK services in your root index.js
file for the initialization service.
import { AppRegistry } from "react-native";
import App from "./App";
import { name as appName } from "./app.json";
import { register } from "@videosdk.live/react-native-sdk";
register();
AppRegistry.registerComponent(appName, () => App);
Create a constants.js
file to store your token and meeting ID.
export const token = "YOUR_VIDEOSDK_AUTH_TOKEN";
export const meetingId = "YOUR_MEETING_ID";
export const name = "User Name";
Step 4: Build UI and wire up MeetingProvider
import React from 'react';
import {
SafeAreaView,
TouchableOpacity,
Text,
View,
FlatList,
} from 'react-native';
import {
MeetingProvider,
useMeeting,
} from '@videosdk.live/react-native-sdk';
import { meetingId, token, name } from './constants';
const Button = ({ onPress, buttonText, backgroundColor }) => {
return (
<TouchableOpacity
onPress={onPress}
style={{
backgroundColor: backgroundColor,
justifyContent: 'center',
alignItems: 'center',
padding: 12,
borderRadius: 4,
}}>
<Text style={{ color: 'white', fontSize: 12 }}>{buttonText}</Text>
</TouchableOpacity>
);
};
function ControlsContainer({ join, leave, toggleMic }) {
return (
<View
style={{
padding: 24,
flexDirection: 'row',
justifyContent: 'space-between',
}}>
<Button
onPress={() => {
join();
}}
buttonText={'Join'}
backgroundColor={'#1178F8'}
/>
<Button
onPress={() => {
toggleMic();
}}
buttonText={'Toggle Mic'}
backgroundColor={'#1178F8'}
/>
<Button
onPress={() => {
leave();
}}
buttonText={'Leave'}
backgroundColor={'#FF0000'}
/>
</View>
);
}
function ParticipantView({ participantDisplayName }) {
return (
<View
style={{
backgroundColor: 'grey',
height: 300,
justifyContent: 'center',
alignItems: 'center',
marginVertical: 8,
marginHorizontal: 8,
}}>
<Text style={{ fontSize: 16 }}>Participant: {participantDisplayName}</Text>
</View>
);
}
function ParticipantList({ participants }) {
return participants.length > 0 ? (
<FlatList
data={participants}
renderItem={({ item }) => {
return <ParticipantView participantDisplayName={item.displayName} />;
}}
/>
) : (
<View
style={{
flex: 1,
backgroundColor: '#F6F6FF',
justifyContent: 'center',
alignItems: 'center',
}}>
<Text style={{ fontSize: 20 }}>Press Join button to enter meeting.</Text>
</View>
);
}
function MeetingView() {
const { join, leave, toggleMic, participants, meetingId } = useMeeting({});
const participantsList = [...participants.values()].map(participant => ({
displayName: participant.displayName,
}));
return (
<View style={{ flex: 1 }}>
{meetingId ? (
<Text style={{ fontSize: 18, padding: 12 }}>Meeting Id : {meetingId}</Text>
) : null}
<ParticipantList participants={participantsList} />
<ControlsContainer
join={join}
leave={leave}
toggleMic={toggleMic}
/>
</View>
);
}
export default function App() {
if (!meetingId || !token) {
return (
<SafeAreaView style={{ flex: 1, backgroundColor: '#F6F6FF' }}>
<View style={{ flex: 1, justifyContent: 'center', alignItems: 'center' }}>
<Text style={{ fontSize: 20, textAlign: 'center' }}>
Please add a valid Meeting ID and Token in the `constants.js` file.
</Text>
</View>
</SafeAreaView>
);
}
return (
<SafeAreaView style={{ flex: 1, backgroundColor: '#F6F6FF' }}>
<MeetingProvider
config={{
meetingId,
micEnabled: true,
webcamEnabled: false,
name,
}}
token={token}>
<MeetingView />
</MeetingProvider>
</SafeAreaView>
);
}
2. Building the Python Agent
Step 1: Configure the Agent
Create a .env
file to store your API keys securely for the Python agent:
# Google API Key for Gemini Live API
GOOGLE_API_KEY=your_google_api_key_here
# VideoSDK Authentication Token
VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token_here
Step 2: Create the Python Agent
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
import logging
logging.getLogger().setLevel(logging.INFO)
class MyVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a high-energy game-show host guiding the caller to guess a secret number from 1 to 100 to win 1,000,000$.",
)
async def on_enter(self) -> None:
await self.session.say("Welcome to the Videosdk's AI Agent game show! I'm your host, and we're about to play for 1,000,000$. Are you ready to play?")
async def on_exit(self) -> None:
await self.session.say("Goodbye!")
async def start_session(context: JobContext):
agent = MyVoiceAgent()
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
# api_key="AIXXXXXXXXXXXXXXXXXXXX",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)
pipeline = RealTimePipeline(model=model)
session = AgentSession(
agent=agent,
pipeline=pipeline
)
def on_transcription(data: dict):
role = data.get("role")
text = data.get("text")
print(f"[TRANSCRIPT][{role}]: {text}")
pipeline.on("realtime_model_transcription", on_transcription)
await context.run_until_shutdown(session=session, wait_for_participant=True)
def make_context() -> JobContext:
room_options = RoomOptions(
# Static meeting ID - same as used in frontend
room_id="YOUR_MEETING_ID", # Replace it with your actual room_id
name="Gemini Agent",
playground=True,
)
return JobContext(room_options=room_options)
if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
3. Run the Application
1) Start the React Native app
npm install
# Android
npm run android
# iOS (macOS only)
cd ios && pod install && cd ..
npm run ios
2) Start the Python Agent
pip install videosdk-agents
pip install "videosdk-plugins-google"
python agent-react-native.py
3) Connect and interact
- Join the meeting from the app and allow microphone permissions.
- When you join, the Python agent detects your participation and starts speaking.
- Talk to the agent in real time and play the number guessing game.
Troubleshooting
- Ensure the same
room_id
is set in both the RN app (constants.js
) and the agent'sRoomOptions
. - Verify microphone and camera permissions on the device/simulator.
- Confirm your VideoSDK token is valid and Google API key is set.
- If audio is silent, check device output volume and that the agent is not in playground mode.
Next Steps
Clone repo for quick implementation
Got a Question? Ask us on discord