AI Agent with Flutter - Quick Start
VideoSDK empowers you to seamlessly integrate AI agents with real-time voice interaction into your Flutter application within minutes.
In this quickstart, you'll explore how to create an AI agent that joins a flutter meeting room and interacts with users through voice using Google Gemini Live API.
Prerequisites
Before proceeding, ensure that your development environment meets the following requirements:
- Video SDK Developer Account (Not having one, follow Video SDK Dashboard)
- Flutter and Python 3.12+ installed on your device
- Google API Key with Gemini Live API access
You need a VideoSDK account to generate a token and a Google API key for the Gemini Live API. Visit the VideoSDK dashboard to generate a token and the Google AI Studio for Google API key.
Project Structure
Your project structure should look like this:
root
├── android
├── ios
├── lib
│ ├── api_call.dart
│ ├── join_screen.dart
│ ├── main.dart
│ ├── meeting_controls.dart
│ ├── meeting_screen.dart
│ └── participant_tile.dart
├── macos
├── web
├── windows
├── agent-flutter.py
└── .env
You will be working on the following files:
join_screen.dart
: Responsible for the user interface to join a meeting.meeting_screen.dart
: Displays the meeting interface and handles meeting logic.api_call.dart
: Handles API calls for creating meetings.agent-flutter.py
: The Python AI agent backend using Google Gemini Live API..env
: For storing API keys.
1. Flutter Frontend
Step 1: Getting Started
Follow these steps to create the environment necessary to add AI agent functionality to your app.
Create a New Flutter App
Create a new Flutter app using the following command:
$ flutter create videosdk_ai_agent_flutter_app
Install VideoSDK
Install the VideoSDK using the following Flutter command. Make sure you are in your Flutter app directory before you run this command.
$ flutter pub add videosdk
$ flutter pub add http
Step 2: Configure Project
For Android
- Update the
/android/app/src/main/AndroidManifest.xml
for the permissions we will be using to implement the audio and video features.
<uses-feature android:name="android.hardware.camera" />
<uses-feature android:name="android.hardware.camera.autofocus" />
<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.CHANGE_NETWORK_STATE" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
<uses-permission android:name="android.permission.INTERNET"/>
- If necessary, in the
build.gradle
you will need to increaseminSdkVersion
ofdefaultConfig
up to23
(currently default Flutter generator set it to16
).
For iOS
- Add the following entries which allow your app to access the camera and microphone to your
/ios/Runner/Info.plist
file :
<key>NSCameraUsageDescription</key>
<string>$(PRODUCT_NAME) Camera Usage!</string>
<key>NSMicrophoneUsageDescription</key>
<string>$(PRODUCT_NAME) Microphone Usage!</string>
- Uncomment the following line to define a global platform for your project in
/ios/Podfile
:
platform :ios, '12.0'
For MacOS
- Add the following entries to your
/macos/Runner/Info.plist
file which allow your app to access the camera and microphone.
<key>NSCameraUsageDescription</key>
<string>$(PRODUCT_NAME) Camera Usage!</string>
<key>NSMicrophoneUsageDescription</key>
<string>$(PRODUCT_NAME) Microphone Usage!</string>
- Add the following entries to your
/macos/Runner/DebugProfile.entitlements
file which allow your app to access the camera, microphone and open outgoing network connections.
<key>com.apple.security.network.client</key>
<true/>
<key>com.apple.security.device.camera</key>
<true/>
<key>com.apple.security.device.microphone</key>
<true/>
- Add the following entries to your
/macos/Runner/Release.entitlements
file which allow your app to access the camera, microphone and open outgoing network connections.
<key>com.apple.security.network.server</key>
<true/>
<key>com.apple.security.network.client</key>
<true/>
<key>com.apple.security.device.camera</key>
<true/>
<key>com.apple.security.device.microphone</key>
<true/>
Step 3: Configure Environment and Credentials
Create a meeting room using the VideoSDK API:
curl -X POST https://api.videosdk.live/v2/rooms \
-H "Authorization: YOUR_JWT_TOKEN_HERE" \
-H "Content-Type: application/json"
Copy the roomId
from the response and configure it in lib/join_screen.dart
and lib/api_call.dart
.
import 'dart:convert';
import 'package:http/http.dart' as http;
//Auth token we will use to generate a meeting and connect to it
const token =
'YOUR_VIDEOSDK_AUTH_TOKEN';
// API call to create meeting
Future<String> createMeeting() async {
final http.Response httpResponse = await http.post(
Uri.parse('https://api.videosdk.live/v2/rooms'),
headers: {'Authorization': token},
);
//Destructuring the roomId from the response
return json.decode(httpResponse.body)['roomId'];
}
import 'package:flutter/material.dart';
import 'api_call.dart';
import 'meeting_screen.dart';
class JoinScreen extends StatelessWidget {
final _meetingIdController = TextEditingController();
JoinScreen({super.key});
void onCreateButtonPressed(BuildContext context) async {
// call api to create meeting and navigate to MeetingScreen with meetingId,token
await createMeeting().then((meetingId) {
if (!context.mounted) return;
Navigator.of(context).push(
MaterialPageRoute(
builder:
(context) => MeetingScreen(meetingId: meetingId, token: token),
),
);
});
}
void onJoinButtonPressed(BuildContext context) {
// check meeting id is not null or invaild
// if meeting id is vaild then navigate to MeetingScreen with meetingId,token
Navigator.of(context).push(
MaterialPageRoute(
builder:
(context) =>
MeetingScreen(meetingId: "YOUR_MEETING_ID", token: token),
),
);
}
@override
Widget build(BuildContext context) {
return Scaffold(
appBar: AppBar(title: const Text('VideoSDK QuickStart')),
body: Padding(
padding: const EdgeInsets.all(12.0),
child: Center(
child: ElevatedButton(
onPressed: () => onJoinButtonPressed(context),
child: const Text('Join Meeting'),
),
),
),
);
}
}
Step 4: Design the User Interface (UI)
Create the main MeetingScreen
component with audio-only interaction in lib/meeting_screen.dart
:
import 'package:flutter/material.dart';
import 'package:videosdk/videosdk.dart';
import 'participant_tile.dart';
import 'meeting_controls.dart';
class MeetingScreen extends StatefulWidget {
final String meetingId;
final String token;
const MeetingScreen({
super.key,
required this.meetingId,
required this.token,
});
@override
State<MeetingScreen> createState() => _MeetingScreenState();
}
class _MeetingScreenState extends State<MeetingScreen> {
late Room _room;
var micEnabled = true;
var camEnabled = true;
Map<String, Participant> participants = {};
@override
void initState() {
// create room
_room = VideoSDK.createRoom(
roomId: widget.meetingId,
token: widget.token,
displayName: "John Doe",
micEnabled: micEnabled,
camEnabled: false,
defaultCameraIndex:
1, // Index of MediaDevices will be used to set default camera
);
setMeetingEventListener();
// Join room
_room.join();
super.initState();
}
// listening to meeting events
void setMeetingEventListener() {
_room.on(Events.roomJoined, () {
setState(() {
participants.putIfAbsent(
_room.localParticipant.id,
() => _room.localParticipant,
);
});
});
_room.on(Events.participantJoined, (Participant participant) {
setState(
() => participants.putIfAbsent(participant.id, () => participant),
);
});
_room.on(Events.participantLeft, (String participantId) {
if (participants.containsKey(participantId)) {
setState(() => participants.remove(participantId));
}
});
_room.on(Events.roomLeft, () {
participants.clear();
Navigator.popUntil(context, ModalRoute.withName('/'));
});
}
// onbackButton pressed leave the room
Future<bool> _onWillPop() async {
_room.leave();
return true;
}
@override
Widget build(BuildContext context) {
return WillPopScope(
onWillPop: () => _onWillPop(),
child: Scaffold(
appBar: AppBar(title: const Text('VideoSDK QuickStart')),
body: Padding(
padding: const EdgeInsets.all(8.0),
child: Column(
children: [
Text(widget.meetingId),
//render all participant
Expanded(
child: Padding(
padding: const EdgeInsets.all(8.0),
child: GridView.builder(
gridDelegate:
const SliverGridDelegateWithFixedCrossAxisCount(
crossAxisCount: 2,
crossAxisSpacing: 10,
mainAxisSpacing: 10,
mainAxisExtent: 300,
),
itemBuilder: (context, index) {
return ParticipantTile(
key: Key(participants.values.elementAt(index).id),
participant: participants.values.elementAt(index),
);
},
itemCount: participants.length,
),
),
),
MeetingControls(
onToggleMicButtonPressed: () {
micEnabled ? _room.muteMic() : _room.unmuteMic();
micEnabled = !micEnabled;
},
onLeaveButtonPressed: () => _room.leave(),
),
],
),
),
),
);
}
}
import 'package:flutter/material.dart';
import 'package:videosdk/videosdk.dart';
class ParticipantTile extends StatefulWidget {
final Participant participant;
const ParticipantTile({super.key, required this.participant});
@override
State<ParticipantTile> createState() => _ParticipantTileState();
}
class _ParticipantTileState extends State<ParticipantTile> {
var pariticpantName;
@override
void initState() {
pariticpantName = widget.participant.displayName;
super.initState();
}
@override
Widget build(BuildContext context) {
return Padding(
padding: const EdgeInsets.all(8.0),
child: Container(
color: Colors.grey.shade800,
child: Center(
child: Text(
'$pariticpantName',
style: TextStyle(color: Colors.white),
),
),
),
);
}
}
import 'package:flutter/material.dart';
class MeetingControls extends StatelessWidget {
final void Function() onToggleMicButtonPressed;
final void Function() onLeaveButtonPressed;
const MeetingControls({
super.key,
required this.onToggleMicButtonPressed,
required this.onLeaveButtonPressed,
});
@override
Widget build(BuildContext context) {
return Row(
mainAxisAlignment: MainAxisAlignment.spaceEvenly,
children: [
ElevatedButton(
onPressed: onLeaveButtonPressed,
child: const Text('Leave'),
),
ElevatedButton(
onPressed: onToggleMicButtonPressed,
child: const Text('Toggle Mic'),
),
],
);
}
}
2. Python AI Agent
Step 1: Create Python AI Agent
Create a .env
file to store your API keys securely for the Python agent:
# Google API Key for Gemini Live API
GOOGLE_API_KEY=your_google_api_key_here
# VideoSDK Authentication Token
VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token_here
Create the Python AI agent that will join the same meeting room and interact with users through voice.
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
import logging
logging.getLogger().setLevel(logging.INFO)
class MyVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a high-energy game-show host guiding the caller to guess a secret number from 1 to 100 to win 1,000,000$.",
)
async def on_enter(self) -> None:
await self.session.say("Welcome to the Videosdk's AI Agent game show! I'm your host, and we're about to play for 1,000,000$. Are you ready to play?")
async def on_exit(self) -> None:
await self.session.say("Goodbye!")
async def start_session(context: JobContext):
agent = MyVoiceAgent()
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
# api_key="AIXXXXXXXXXXXXXXXXXXXX",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)
pipeline = RealTimePipeline(model=model)
session = AgentSession(
agent=agent,
pipeline=pipeline
)
def on_transcription(data: dict):
role = data.get("role")
text = data.get("text")
print(f"[TRANSCRIPT][{role}]: {text}")
pipeline.on("realtime_model_transcription", on_transcription)
await context.run_until_shutdown(session=session, wait_for_participant=True)
def make_context() -> JobContext:
room_options = RoomOptions(
# Static meeting ID - same as used in frontend
room_id="YOUR_MEETING_ID", # Replace it with your actual room_id
name="Gemini Agent",
playground=True,
)
return JobContext(room_options=room_options)
if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
3. Run the Application
Step 1: Run the Frontend
Once you have completed all the steps mentioned above, start your Flutter application:
flutter run
Step 2: Run the AI Agent
Open a new terminal and run the Python agent:
# Install Python dependencies
pip install "videosdk-plugins-google"
pip install videosdk-agents
# Run the AI agent
python agent-flutter.py
Step 3: Connect and Interact
-
Join the meeting from the Flutter app:
- Click the "Join Meeting" button.
- Allow microphone permissions when prompted.
-
Agent connection:
- Once you join, the Python agent will detect your participation.
- You should see "Participant joined" in the terminal.
- The AI agent will greet you and start the game.
-
Start playing:
- The agent will guide you through a number guessing game (1-100).
- Use your microphone to interact with the AI host.
- The agent will provide hints and encouragement throughout the game.
Troubleshooting
Common Issues:
-
"Waiting for participant..." but no connection:
- Ensure both the frontend and the agent are running.
- Check that the room ID matches in both
lib/join_screen.dart
andagent-flutter.py
. - Verify your VideoSDK token is valid.
-
Audio not working:
- Check browser permissions for microphone access.
- Ensure your Google API key has Gemini Live API access enabled.
-
Agent not responding:
- Verify your Google API key is correctly set in the environment.
- Check that the Gemini Live API is enabled in your Google Cloud Console.
-
Flutter build issues:
- Ensure your Flutter version is compatible.
- Try cleaning the build:
flutter clean
. - Delete
pubspec.lock
and runflutter pub get
.
Next Steps
Clone repo for quick implementation
Got a Question? Ask us on discord