Skip to main content

AI Agent with React Native - Quick Start

VideoSDK empowers you to integrate an AI voice agent into your React Native app (Android/iOS) within minutes. The agent joins the same meeting room and interacts over voice using the Google Gemini Live API.

Prerequisites

  • VideoSDK Developer Account (get token from the dashboard)
  • Node.js and a working React Native environment (Android Studio and/or Xcode)
  • Python 3.12+
  • Google API Key with Gemini Live API access
important

You need a VideoSDK account to generate a token and a Google API key for the Gemini Live API. Visit the VideoSDK dashboard to generate a token and the Google AI Studio for Google API key.

Project Structure

First, create an empty project using mkdir folder_name on your preferable location for the React Native Frontend. Your final project structure should look like this:

Directory Structure
  root
├── android/
├── ios/
├── App.js
├── constants.js
├── index.js
├── agent-react-native.py
└── .env

You will work on:

  • android/: Contains the Android-specific project files.
  • ios/: Contains the iOS-specific project files.
  • App.js: The main React Native component, containing the UI and meeting logic.
  • constants.js: To store token and meetingId for the frontend.
  • index.js: The entry point of the React Native application, where VideoSDK is registered.
  • agent-react-native.py: The Python agent that joins the meeting.
  • .env: Environment variables file for the Python agent (stores API keys).

1. Building the React Native Frontend

Step 1: Create App and Install SDKs

Create a React Native app and install the VideoSDK RN SDK:

npx react-native init videosdkAiAgentRN
cd videosdkAiAgentRN

# Install VideoSDK
npm install "@videosdk.live/react-native-sdk"

Step 2: Configure the Project

Android Setup

android/app/src/main/AndroidManifest.xml
<manifest
xmlns:android="http://schemas.android.com/apk/res/android"
>
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission
android:name="android.permission.BLUETOOTH"
android:maxSdkVersion="30" />
<uses-permission
android:name="android.permission.BLUETOOTH_ADMIN"
android:maxSdkVersion="30" />

<uses-permission android:name="android.permission.BLUETOOTH_CONNECT" />

<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.WAKE_LOCK" />
</manifest>
android/app/build.gradle
  dependencies {
implementation project(':rnwebrtc')
}
android/settings.gradle
include ':rnwebrtc'
project(':rnwebrtc').projectDir = new File(rootProject.projectDir, '../node_modules/@videosdk.live/react-native-webrtc/android')
MainApplication.kt
import live.videosdk.rnwebrtc.WebRTCModulePackage

class MainApplication : Application(), ReactApplication {
override val reactNativeHost: ReactNativeHost =
object : DefaultReactNativeHost(this) {
override fun getPackages(): List<ReactPackage> {
val packages = PackageList(this).packages.toMutableList()
packages.add(WebRTCModulePackage())
return packages
}
// ...
}
}
android/gradle.properties
/* This one fixes a weird WebRTC runtime problem on some devices. */
android.enableDexingArtifactTransform.desugaring=false
android/build.gradle
buildscript {
ext {
minSdkVersion = 23
}
}

iOS Setup

To update CocoaPods, you can reinstall the gem using the following command:

$ sudo gem install cocoapods
ios/Podfile
pod ‘react-native-webrtc’, :path => ‘../node_modules/@videosdk.live/react-native-webrtc’

You need to change the platform field in the Podfile to 12.0 or above because react-native-webrtc doesn't support iOS versions earlier than 12.0. Update the line: platform : ios, ‘12.0’.

After updating the version, you need to install the pods by running the following command:

pod install

Add the following lines to your info.plist file located at (project folder/ios/projectname/info.plist):

ios/MyApp/Info.plist
<key>NSCameraUsageDescription</key>
<string>Camera permission description</string>
<key>NSMicrophoneUsageDescription</key>
<string>Microphone permission description</string>

Step 3: Register Service and Configure

Register VideoSDK services in your root index.js file for the initialization service.

index.js
import { AppRegistry } from "react-native";
import App from "./App";
import { name as appName } from "./app.json";
import { register } from "@videosdk.live/react-native-sdk";

register();

AppRegistry.registerComponent(appName, () => App);

Create a constants.js file to store your token and meeting ID.

constants.js
export const token = "YOUR_VIDEOSDK_AUTH_TOKEN";
export const meetingId = "YOUR_MEETING_ID";
export const name = "User Name";

Step 4: Build UI and wire up MeetingProvider

App.js
import React from 'react';
import {
SafeAreaView,
TouchableOpacity,
Text,
View,
FlatList,
} from 'react-native';
import {
MeetingProvider,
useMeeting,
} from '@videosdk.live/react-native-sdk';
import { meetingId, token, name } from './constants';

const Button = ({ onPress, buttonText, backgroundColor }) => {
return (
<TouchableOpacity
onPress={onPress}
style={{
backgroundColor: backgroundColor,
justifyContent: 'center',
alignItems: 'center',
padding: 12,
borderRadius: 4,
}}>
<Text style={{ color: 'white', fontSize: 12 }}>{buttonText}</Text>
</TouchableOpacity>
);
};

function ControlsContainer({ join, leave, toggleMic }) {
return (
<View
style={{
padding: 24,
flexDirection: 'row',
justifyContent: 'space-between',
}}>
<Button
onPress={() => {
join();
}}
buttonText={'Join'}
backgroundColor={'#1178F8'}
/>
<Button
onPress={() => {
toggleMic();
}}
buttonText={'Toggle Mic'}
backgroundColor={'#1178F8'}
/>
<Button
onPress={() => {
leave();
}}
buttonText={'Leave'}
backgroundColor={'#FF0000'}
/>
</View>
);
}

function ParticipantView({ participantDisplayName }) {
return (
<View
style={{
backgroundColor: 'grey',
height: 300,
justifyContent: 'center',
alignItems: 'center',
marginVertical: 8,
marginHorizontal: 8,
}}>
<Text style={{ fontSize: 16 }}>Participant: {participantDisplayName}</Text>
</View>
);
}

function ParticipantList({ participants }) {
return participants.length > 0 ? (
<FlatList
data={participants}
renderItem={({ item }) => {
return <ParticipantView participantDisplayName={item.displayName} />;
}}
/>
) : (
<View
style={{
flex: 1,
backgroundColor: '#F6F6FF',
justifyContent: 'center',
alignItems: 'center',
}}>
<Text style={{ fontSize: 20 }}>Press Join button to enter meeting.</Text>
</View>
);
}

function MeetingView() {
const { join, leave, toggleMic, participants, meetingId } = useMeeting({});

const participantsList = [...participants.values()].map(participant => ({
displayName: participant.displayName,
}));

return (
<View style={{ flex: 1 }}>
{meetingId ? (
<Text style={{ fontSize: 18, padding: 12 }}>Meeting Id : {meetingId}</Text>
) : null}
<ParticipantList participants={participantsList} />
<ControlsContainer
join={join}
leave={leave}
toggleMic={toggleMic}
/>
</View>
);
}

export default function App() {
if (!meetingId || !token) {
return (
<SafeAreaView style={{ flex: 1, backgroundColor: '#F6F6FF' }}>
<View style={{ flex: 1, justifyContent: 'center', alignItems: 'center' }}>
<Text style={{ fontSize: 20, textAlign: 'center' }}>
Please add a valid Meeting ID and Token in the `constants.js` file.
</Text>
</View>
</SafeAreaView>
);
}

return (
<SafeAreaView style={{ flex: 1, backgroundColor: '#F6F6FF' }}>
<MeetingProvider
config={{
meetingId,
micEnabled: true,
webcamEnabled: false,
name,
}}
token={token}>
<MeetingView />
</MeetingProvider>
</SafeAreaView>
);
}

2. Building the Python Agent

Step 1: Configure the Agent

Create a .env file to store your API keys securely for the Python agent:

.env
# Google API Key for Gemini Live API
GOOGLE_API_KEY=your_google_api_key_here

# VideoSDK Authentication Token
VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token_here

Step 2: Create the Python Agent

agent-react-native.py
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
import logging

logging.getLogger().setLevel(logging.INFO)

class MyVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a high-energy game-show host guiding the caller to guess a secret number from 1 to 100 to win 1,000,000$.",
)

async def on_enter(self) -> None:
await self.session.say("Welcome to the Videosdk's AI Agent game show! I'm your host, and we're about to play for 1,000,000$. Are you ready to play?")

async def on_exit(self) -> None:
await self.session.say("Goodbye!")

async def start_session(context: JobContext):
agent = MyVoiceAgent()
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
# api_key="AIXXXXXXXXXXXXXXXXXXXX",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)

pipeline = RealTimePipeline(model=model)
session = AgentSession(
agent=agent,
pipeline=pipeline
)

def on_transcription(data: dict):
role = data.get("role")
text = data.get("text")
print(f"[TRANSCRIPT][{role}]: {text}")
pipeline.on("realtime_model_transcription", on_transcription)

await context.run_until_shutdown(session=session, wait_for_participant=True)

def make_context() -> JobContext:
room_options = RoomOptions(
# Static meeting ID - same as used in frontend
room_id="YOUR_MEETING_ID", # Replace it with your actual room_id
name="Gemini Agent",
playground=True,
)

return JobContext(room_options=room_options)

if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()

3. Run the Application

1) Start the React Native app

npm install

# Android
npm run android

# iOS (macOS only)
cd ios && pod install && cd ..
npm run ios

2) Start the Python Agent

pip install videosdk-agents
pip install "videosdk-plugins-google"

python agent-react-native.py

3) Connect and interact

  1. Join the meeting from the app and allow microphone permissions.
  2. When you join, the Python agent detects your participation and starts speaking.
  3. Talk to the agent in real time and play the number guessing game.

Troubleshooting

  • Ensure the same room_id is set in both the RN app (constants.js) and the agent's RoomOptions.
  • Verify microphone and camera permissions on the device/simulator.
  • Confirm your VideoSDK token is valid and Google API key is set.
  • If audio is silent, check device output volume and that the agent is not in playground mode.

Next Steps

Clone repo for quick implementation

Got a Question? Ask us on discord