AI Agent with React - Quick Start

VideoSDK empowers you to seamlessly integrate AI agents with real-time voice interaction into your React application within minutes.

In this quickstart, you'll explore how to create an AI agent that joins a meeting room and interacts with users through voice using Google Gemini Live API.

Prerequisites

Before proceeding, ensure that your development environment meets the following requirements:

Video SDK Developer Account (Not having one, follow Video SDK Dashboard)
Node.js and Python 3.12+ installed on your device
Google API Key with Gemini Live API access

important

You need a VideoSDK account to generate a token and a Google API key for the Gemini Live API. Visit the VideoSDK dashboard to generate a token and the Google AI Studio for Google API key.

Project Structure

Your project structure should look like this.

Project Structure
  root
   ├── node_modules
   ├── public
   ├── src
   │    ├── config.js
   │    ├── App.js
   │    └── index.js
   ├── agent-react.py
   └── .env

You will be working on the following files:

App.js: Responsible for creating a basic UI for joining the meeting
config.js: Responsible for storing the token and room ID
index.js: This is the entry point of your React application.
agent-react.py: Python AI agent backend using Google Gemini Live API
.env: Environment variables for API keys

Part 1: React Frontend

Step 1: Getting Started with the Code!

Getting Started with the Code!

Create new React App

Create a new React App using the below command.

$ npx create-react-app videosdk-ai-agent-react-app

Install VideoSDK

Install the VideoSDK using the below-mentioned npm command. Make sure you are in your react app directory before you run this command.

$ npm install "@videosdk.live/react-sdk"

Step 2: Configure Environment and Credentials

Configure Environment and Credentials

Create a meeting room using the VideoSDK API:

curl -X POST https://api.videosdk.live/v2/rooms \
  -H "Authorization: YOUR_JWT_TOKEN_HERE" \
  -H "Content-Type: application/json"

Copy the roomId from the response and configure it in src/config.js:

src/config.js
export const TOKEN = "YOUR_VIDEOSDK_AUTH_TOKEN";
export const ROOM_ID = "YOUR_MEETING_ID"; // Create using VideoSDK API (curl -X POST https://api.videosdk.live/v2/rooms)

Step 3: Design the user interface (UI)

Design the user interface (UI)

Create the main App component with audio-only interaction in src/App.js:

src/App.js
import React, { useEffect, useRef, useState } from "react";
import { MeetingProvider, MeetingConsumer, useMeeting, useParticipant } from "@videosdk.live/react-sdk";
import { TOKEN, ROOM_ID } from "./config";

function ParticipantAudio({ participantId }) {
  const { micStream, micOn, isLocal, displayName } = useParticipant(participantId);
  const audioRef = useRef(null);

  useEffect(() => {
    if (!audioRef.current) return;
    if (micOn && micStream) {
      const mediaStream = new MediaStream();
      mediaStream.addTrack(micStream.track);
      audioRef.current.srcObject = mediaStream;
      audioRef.current.play().catch(() => {});
    } else {
      audioRef.current.srcObject = null;
    }
  }, [micStream, micOn]);

  return (
    <div>
      <p>Participant: {displayName} | Mic: {micOn ? "ON" : "OFF"}</p>
      <audio ref={audioRef} autoPlay muted={isLocal} />
    </div>
  );
}

function Controls() {
  const { leave, toggleMic } = useMeeting();
  return (
    <div>
      <button onClick={() => leave()}>Leave</button>
      <button onClick={() => toggleMic()}>Toggle Mic</button>
    </div>
  );
}

function MeetingView({ meetingId, onMeetingLeave }) {
  const [joined, setJoined] = useState(null);
  const { join, participants } = useMeeting({
    onMeetingJoined: () => setJoined("JOINED"),
    onMeetingLeft: onMeetingLeave,
  });

  const joinMeeting = () => {
    setJoined("JOINING");
    join();
  };

  return (
    <div>
      <h3>Meeting Id: {meetingId}</h3>
      {joined === "JOINED" ? (
        <div>
          <Controls />
          {[...participants.keys()].map((pid) => (
            <ParticipantAudio key={pid} participantId={pid} />
          ))}
        </div>
      ) : joined === "JOINING" ? (
        <p>Joining the meeting...</p>
      ) : (
        <button onClick={joinMeeting}>Join</button>
      )}
    </div>
  );
}

export default function App() {
  const [meetingId] = useState(ROOM_ID);

  const onMeetingLeave = () => {
    // no-op; simple sample
  };

  return (
    <MeetingProvider
      config={{
        meetingId,
        micEnabled: true,
        webcamEnabled: false,
        name: "Agent React User",
        multiStream: false,
      }}
      token={TOKEN}
    >
      <MeetingConsumer>
        {() => <MeetingView meetingId={meetingId} onMeetingLeave={onMeetingLeave} />}
      </MeetingConsumer>
    </MeetingProvider>
  );
}

Part 2: Python AI Agent

Step 1: Create AI Agent Backend

Configure the Agent

Create a .env file to store your API keys securely for the Python agent:

.env
# Google API Key for Gemini Live API
GOOGLE_API_KEY=your_google_api_key_here

# VideoSDK Authentication Token
VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token_here

Create AI Agent Backend

Create the Python AI agent that will join the same meeting room and interact with users through voice.

agent-react.py
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
import logging

logging.getLogger().setLevel(logging.INFO)

class MyVoiceAgent(Agent): 
    def __init__(self):
        super().__init__(
            instructions="You are a high-energy game-show host guiding the caller to guess a secret number from 1 to 100 to win 1,000,000$.",
        )

    async def on_enter(self) -> None:
        await self.session.say("Welcome to the Videosdk's AI Agent game show! I'm your host, and we're about to play for 1,000,000$. Are you ready to play?")
    
    async def on_exit(self) -> None:
        await self.session.say("Goodbye!")

async def start_session(context: JobContext):
    agent = MyVoiceAgent()
    model = GeminiRealtime(
        model="gemini-2.0-flash-live-001",
        # When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
        # api_key="AIXXXXXXXXXXXXXXXXXXXX", 
        config=GeminiLiveConfig(
            voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
            response_modalities=["AUDIO"]
        )
    )

    pipeline = RealTimePipeline(model=model)
    session = AgentSession(
        agent=agent,
        pipeline=pipeline
    )

    def on_transcription(data: dict):
        role = data.get("role")
        text = data.get("text")
        print(f"[TRANSCRIPT][{role}]: {text}")
    pipeline.on("realtime_model_transcription", on_transcription)

    await context.run_until_shutdown(session=session, wait_for_participant=True)

def make_context() -> JobContext:
    room_options = RoomOptions(
        # Static meeting ID - same as used in frontend
        room_id="YOUR_MEETING_ID", # Replace it with your actual room_id
        name="Gemini Agent",
        playground=True,
    )

    return JobContext(room_options=room_options)

if __name__ == "__main__":
    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
    job.start()

Part 3: Run the Application

Step 1: Run the Frontend

Run the Frontend

Once you have completed all the steps mentioned above, start your React application:

# Install dependencies
npm install

# Start the development server
npm start

Open http://localhost:3000 in your web browser.

Step 2: Run the AI Agent

Run the AI Agent

Open a new terminal and run the Python agent:

# Install Python dependencies
pip install "videosdk-plugins-google"
pip install videosdk-agents

# Run the AI agent
python agent-react.py

Step 3: Connect and Interact

Connect and Interact

Join the meeting from the React app:
- Click the "Join" button in your browser
- Allow microphone permissions when prompted
Agent connection:
- Once you join, the Python backend will detect your participation
- You should see "Participant joined" in the terminal
- The AI agent will greet and initiate the game
Start playing:
- The agent will guide you through a number guessing game (1-100)
- Use your microphone to interact with the AI host
- The agent will provide hints and encouragement throughout the game

Troubleshooting

Common Issues:

"Waiting for participant..." but no connection:
- Ensure both frontend and backend are running
- Check that the room ID matches in both src/config.js and agent-react.py
- Verify your VideoSDK token is valid
Audio not working:
- Check browser permissions for microphone access
- Ensure your Google API key has Gemini Live API access enabled
Agent not responding:
- Verify your Google API key is correctly set in the environment
- Check that the Gemini Live API is enabled in your Google Cloud Console
React build issues:
- Ensure Node.js version is compatible
- Try clearing npm cache: npm cache clean --force
- Delete node_modules and reinstall: rm -rf node_modules && npm install

Next Steps

Clone repo for quick implementation

Quickstart Example

Complete working example with source code

Got a Question? Ask us on discord

Prerequisites​

Project Structure​

Part 1: React Frontend​

Step 1: Getting Started with the Code!​

Create new React App​

Install VideoSDK​

Step 2: Configure Environment and Credentials​

Step 3: Design the user interface (UI)​

Part 2: Python AI Agent​

Step 1: Create AI Agent Backend​

Part 3: Run the Application​

Step 1: Run the Frontend​

Step 2: Run the AI Agent​

Step 3: Connect and Interact​

Troubleshooting​

Common Issues:​

Next Steps​