AI Agent with React - Quick Start
VideoSDK empowers you to seamlessly integrate AI agents with real-time voice interaction into your React application within minutes.
In this quickstart, you'll explore how to create an AI agent that joins a meeting room and interacts with users through voice using Google Gemini Live API.
Prerequisites
Before proceeding, ensure that your development environment meets the following requirements:
- Video SDK Developer Account (Not having one, follow Video SDK Dashboard)
- Node.js and Python 3.12+ installed on your device
- Google API Key with Gemini Live API access
You need a VideoSDK account to generate a token and a Google API key for the Gemini Live API. Visit the VideoSDK dashboard to generate a token and the Google AI Studio for Google API key.
Project Structure
Your project structure should look like this.
root
├── node_modules
├── public
├── src
│ ├── config.js
│ ├── App.js
│ └── index.js
├── agent-react.py
└── .env
You will be working on the following files:
App.js
: Responsible for creating a basic UI for joining the meetingconfig.js
: Responsible for storing the token and room IDindex.js
: This is the entry point of your React application.agent-react.py
: Python AI agent backend using Google Gemini Live API.env
: Environment variables for API keys
Part 1: React Frontend
Step 1: Getting Started with the Code!
Create new React App
Create a new React App using the below command.
$ npx create-react-app videosdk-ai-agent-react-app
Install VideoSDK
Install the VideoSDK using the below-mentioned npm command. Make sure you are in your react app directory before you run this command.
$ npm install "@videosdk.live/react-sdk"
Step 2: Configure Environment and Credentials
Create a meeting room using the VideoSDK API:
curl -X POST https://api.videosdk.live/v2/rooms \
-H "Authorization: YOUR_JWT_TOKEN_HERE" \
-H "Content-Type: application/json"
Copy the roomId
from the response and configure it in src/config.js
:
export const TOKEN = "YOUR_VIDEOSDK_AUTH_TOKEN";
export const ROOM_ID = "YOUR_MEETING_ID"; // Create using VideoSDK API (curl -X POST https://api.videosdk.live/v2/rooms)
Step 3: Design the user interface (UI)
Create the main App component with audio-only interaction in src/App.js
:
import React, { useEffect, useRef, useState } from "react";
import { MeetingProvider, MeetingConsumer, useMeeting, useParticipant } from "@videosdk.live/react-sdk";
import { TOKEN, ROOM_ID } from "./config";
function ParticipantAudio({ participantId }) {
const { micStream, micOn, isLocal, displayName } = useParticipant(participantId);
const audioRef = useRef(null);
useEffect(() => {
if (!audioRef.current) return;
if (micOn && micStream) {
const mediaStream = new MediaStream();
mediaStream.addTrack(micStream.track);
audioRef.current.srcObject = mediaStream;
audioRef.current.play().catch(() => {});
} else {
audioRef.current.srcObject = null;
}
}, [micStream, micOn]);
return (
<div>
<p>Participant: {displayName} | Mic: {micOn ? "ON" : "OFF"}</p>
<audio ref={audioRef} autoPlay muted={isLocal} />
</div>
);
}
function Controls() {
const { leave, toggleMic } = useMeeting();
return (
<div>
<button onClick={() => leave()}>Leave</button>
<button onClick={() => toggleMic()}>Toggle Mic</button>
</div>
);
}
function MeetingView({ meetingId, onMeetingLeave }) {
const [joined, setJoined] = useState(null);
const { join, participants } = useMeeting({
onMeetingJoined: () => setJoined("JOINED"),
onMeetingLeft: onMeetingLeave,
});
const joinMeeting = () => {
setJoined("JOINING");
join();
};
return (
<div>
<h3>Meeting Id: {meetingId}</h3>
{joined === "JOINED" ? (
<div>
<Controls />
{[...participants.keys()].map((pid) => (
<ParticipantAudio key={pid} participantId={pid} />
))}
</div>
) : joined === "JOINING" ? (
<p>Joining the meeting...</p>
) : (
<button onClick={joinMeeting}>Join</button>
)}
</div>
);
}
export default function App() {
const [meetingId] = useState(ROOM_ID);
const onMeetingLeave = () => {
// no-op; simple sample
};
return (
<MeetingProvider
config={{
meetingId,
micEnabled: true,
webcamEnabled: false,
name: "Agent React User",
multiStream: false,
}}
token={TOKEN}
>
<MeetingConsumer>
{() => <MeetingView meetingId={meetingId} onMeetingLeave={onMeetingLeave} />}
</MeetingConsumer>
</MeetingProvider>
);
}
Part 2: Python AI Agent
Step 1: Create AI Agent Backend
Create a .env
file to store your API keys securely for the Python agent:
# Google API Key for Gemini Live API
GOOGLE_API_KEY=your_google_api_key_here
# VideoSDK Authentication Token
VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token_here
Create the Python AI agent that will join the same meeting room and interact with users through voice.
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
import logging
logging.getLogger().setLevel(logging.INFO)
class MyVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a high-energy game-show host guiding the caller to guess a secret number from 1 to 100 to win 1,000,000$.",
)
async def on_enter(self) -> None:
await self.session.say("Welcome to the Videosdk's AI Agent game show! I'm your host, and we're about to play for 1,000,000$. Are you ready to play?")
async def on_exit(self) -> None:
await self.session.say("Goodbye!")
async def start_session(context: JobContext):
agent = MyVoiceAgent()
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
# api_key="AIXXXXXXXXXXXXXXXXXXXX",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)
pipeline = RealTimePipeline(model=model)
session = AgentSession(
agent=agent,
pipeline=pipeline
)
def on_transcription(data: dict):
role = data.get("role")
text = data.get("text")
print(f"[TRANSCRIPT][{role}]: {text}")
pipeline.on("realtime_model_transcription", on_transcription)
await context.run_until_shutdown(session=session, wait_for_participant=True)
def make_context() -> JobContext:
room_options = RoomOptions(
# Static meeting ID - same as used in frontend
room_id="YOUR_MEETING_ID", # Replace it with your actual room_id
name="Gemini Agent",
playground=True,
)
return JobContext(room_options=room_options)
if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
Part 3: Run the Application
Step 1: Run the Frontend
Once you have completed all the steps mentioned above, start your React application:
# Install dependencies
npm install
# Start the development server
npm start
Open http://localhost:3000
in your web browser.
Step 2: Run the AI Agent
Open a new terminal and run the Python agent:
# Install Python dependencies
pip install "videosdk-plugins-google"
pip install videosdk-agents
# Run the AI agent
python agent-react.py
Step 3: Connect and Interact
-
Join the meeting from the React app:
- Click the "Join" button in your browser
- Allow microphone permissions when prompted
-
Agent connection:
- Once you join, the Python backend will detect your participation
- You should see "Participant joined" in the terminal
- The AI agent will greet and initiate the game
-
Start playing:
- The agent will guide you through a number guessing game (1-100)
- Use your microphone to interact with the AI host
- The agent will provide hints and encouragement throughout the game
Troubleshooting
Common Issues:
-
"Waiting for participant..." but no connection:
- Ensure both frontend and backend are running
- Check that the room ID matches in both
src/config.js
andagent-react.py
- Verify your VideoSDK token is valid
-
Audio not working:
- Check browser permissions for microphone access
- Ensure your Google API key has Gemini Live API access enabled
-
Agent not responding:
- Verify your Google API key is correctly set in the environment
- Check that the Gemini Live API is enabled in your Google Cloud Console
-
React build issues:
- Ensure Node.js version is compatible
- Try clearing npm cache:
npm cache clean --force
- Delete
node_modules
and reinstall:rm -rf node_modules && npm install
Next Steps
Clone repo for quick implementation
Got a Question? Ask us on discord