Skip to main content

AI Agent with React - Quick Start

VideoSDK empowers you to seamlessly integrate AI agents with real-time voice interaction into your React application within minutes.

In this quickstart, you'll explore how to create an AI agent that joins a meeting room and interacts with users through voice using Google Gemini Live API.

Prerequisites

Before proceeding, ensure that your development environment meets the following requirements:

  • Video SDK Developer Account (Not having one, follow Video SDK Dashboard)
  • Node.js and Python 3.12+ installed on your device
  • Google API Key with Gemini Live API access
important

You need a VideoSDK account to generate a token and a Google API key for the Gemini Live API. Visit the VideoSDK dashboard to generate a token and the Google AI Studio for Google API key.

Project Structure

Your project structure should look like this.

Project Structure
  root
├── node_modules
├── public
├── src
│ ├── config.js
│ ├── App.js
│ └── index.js
├── agent-react.py
└── .env

You will be working on the following files:

  • App.js: Responsible for creating a basic UI for joining the meeting
  • config.js: Responsible for storing the token and room ID
  • index.js: This is the entry point of your React application.
  • agent-react.py: Python AI agent backend using Google Gemini Live API
  • .env: Environment variables for API keys

Part 1: React Frontend

Step 1: Getting Started with the Code!

Create new React App

Create a new React App using the below command.

$ npx create-react-app videosdk-ai-agent-react-app

Install VideoSDK

Install the VideoSDK using the below-mentioned npm command. Make sure you are in your react app directory before you run this command.

$ npm install "@videosdk.live/react-sdk"

Step 2: Configure Environment and Credentials

Create a meeting room using the VideoSDK API:

curl -X POST https://api.videosdk.live/v2/rooms \
-H "Authorization: YOUR_JWT_TOKEN_HERE" \
-H "Content-Type: application/json"

Copy the roomId from the response and configure it in src/config.js:

src/config.js
export const TOKEN = "YOUR_VIDEOSDK_AUTH_TOKEN";
export const ROOM_ID = "YOUR_MEETING_ID"; // Create using VideoSDK API (curl -X POST https://api.videosdk.live/v2/rooms)

Step 3: Design the user interface (UI)

Create the main App component with audio-only interaction in src/App.js:

src/App.js
import React, { useEffect, useRef, useState } from "react";
import { MeetingProvider, MeetingConsumer, useMeeting, useParticipant } from "@videosdk.live/react-sdk";
import { TOKEN, ROOM_ID } from "./config";

function ParticipantAudio({ participantId }) {
const { micStream, micOn, isLocal, displayName } = useParticipant(participantId);
const audioRef = useRef(null);

useEffect(() => {
if (!audioRef.current) return;
if (micOn && micStream) {
const mediaStream = new MediaStream();
mediaStream.addTrack(micStream.track);
audioRef.current.srcObject = mediaStream;
audioRef.current.play().catch(() => {});
} else {
audioRef.current.srcObject = null;
}
}, [micStream, micOn]);

return (
<div>
<p>Participant: {displayName} | Mic: {micOn ? "ON" : "OFF"}</p>
<audio ref={audioRef} autoPlay muted={isLocal} />
</div>
);
}

function Controls() {
const { leave, toggleMic } = useMeeting();
return (
<div>
<button onClick={() => leave()}>Leave</button>
<button onClick={() => toggleMic()}>Toggle Mic</button>
</div>
);
}

function MeetingView({ meetingId, onMeetingLeave }) {
const [joined, setJoined] = useState(null);
const { join, participants } = useMeeting({
onMeetingJoined: () => setJoined("JOINED"),
onMeetingLeft: onMeetingLeave,
});

const joinMeeting = () => {
setJoined("JOINING");
join();
};

return (
<div>
<h3>Meeting Id: {meetingId}</h3>
{joined === "JOINED" ? (
<div>
<Controls />
{[...participants.keys()].map((pid) => (
<ParticipantAudio key={pid} participantId={pid} />
))}
</div>
) : joined === "JOINING" ? (
<p>Joining the meeting...</p>
) : (
<button onClick={joinMeeting}>Join</button>
)}
</div>
);
}

export default function App() {
const [meetingId] = useState(ROOM_ID);

const onMeetingLeave = () => {
// no-op; simple sample
};

return (
<MeetingProvider
config={{
meetingId,
micEnabled: true,
webcamEnabled: false,
name: "Agent React User",
multiStream: false,
}}
token={TOKEN}
>
<MeetingConsumer>
{() => <MeetingView meetingId={meetingId} onMeetingLeave={onMeetingLeave} />}
</MeetingConsumer>
</MeetingProvider>
);
}

Part 2: Python AI Agent

Step 1: Create AI Agent Backend

Create a .env file to store your API keys securely for the Python agent:

.env
# Google API Key for Gemini Live API
GOOGLE_API_KEY=your_google_api_key_here

# VideoSDK Authentication Token
VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token_here

Create the Python AI agent that will join the same meeting room and interact with users through voice.

agent-react.py
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
import logging

logging.getLogger().setLevel(logging.INFO)

class MyVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a high-energy game-show host guiding the caller to guess a secret number from 1 to 100 to win 1,000,000$.",
)

async def on_enter(self) -> None:
await self.session.say("Welcome to the Videosdk's AI Agent game show! I'm your host, and we're about to play for 1,000,000$. Are you ready to play?")

async def on_exit(self) -> None:
await self.session.say("Goodbye!")

async def start_session(context: JobContext):
agent = MyVoiceAgent()
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
# api_key="AIXXXXXXXXXXXXXXXXXXXX",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)

pipeline = RealTimePipeline(model=model)
session = AgentSession(
agent=agent,
pipeline=pipeline
)

def on_transcription(data: dict):
role = data.get("role")
text = data.get("text")
print(f"[TRANSCRIPT][{role}]: {text}")
pipeline.on("realtime_model_transcription", on_transcription)

await context.run_until_shutdown(session=session, wait_for_participant=True)

def make_context() -> JobContext:
room_options = RoomOptions(
# Static meeting ID - same as used in frontend
room_id="YOUR_MEETING_ID", # Replace it with your actual room_id
name="Gemini Agent",
playground=True,
)

return JobContext(room_options=room_options)

if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()

Part 3: Run the Application

Step 1: Run the Frontend

Once you have completed all the steps mentioned above, start your React application:

# Install dependencies
npm install

# Start the development server
npm start

Open http://localhost:3000 in your web browser.

Step 2: Run the AI Agent

Open a new terminal and run the Python agent:

# Install Python dependencies
pip install "videosdk-plugins-google"
pip install videosdk-agents

# Run the AI agent
python agent-react.py

Step 3: Connect and Interact

  1. Join the meeting from the React app:

    • Click the "Join" button in your browser
    • Allow microphone permissions when prompted
  2. Agent connection:

    • Once you join, the Python backend will detect your participation
    • You should see "Participant joined" in the terminal
    • The AI agent will greet and initiate the game
  3. Start playing:

    • The agent will guide you through a number guessing game (1-100)
    • Use your microphone to interact with the AI host
    • The agent will provide hints and encouragement throughout the game

Troubleshooting

Common Issues:

  1. "Waiting for participant..." but no connection:

    • Ensure both frontend and backend are running
    • Check that the room ID matches in both src/config.js and agent-react.py
    • Verify your VideoSDK token is valid
  2. Audio not working:

    • Check browser permissions for microphone access
    • Ensure your Google API key has Gemini Live API access enabled
  3. Agent not responding:

    • Verify your Google API key is correctly set in the environment
    • Check that the Gemini Live API is enabled in your Google Cloud Console
  4. React build issues:

    • Ensure Node.js version is compatible
    • Try clearing npm cache: npm cache clean --force
    • Delete node_modules and reinstall: rm -rf node_modules && npm install

Next Steps

Clone repo for quick implementation

Got a Question? Ask us on discord