Skip to main content
Version: 0.0.x

IoT SDK + AI Agent Example

By pairing the IoT SDK with an AI Agent, your devices stop being passive endpoints and start acting like intelligent participants. Instead of simply sending sensor data or audio streams, they can listen, understand, and respond instantly — just like a human participant in a meeting.

This isn’t a traditional IoT integration. It’s a way to give your hardware a voice, a brain, and the ability to interact naturally with people and systems.

What You Can Build

Smart Home & Lifestyle
  • Adaptive Climate Control – Imagine your AC or heater not only sensing the temperature but predicting your comfort level. It speaks up before you even reach for the remote: “It feels a bit warm — shall I cool things down?”

  • Interactive Home Chef – Your countertop companion scans your fridge, suggests recipes you’ll actually love, then adjusts the oven and guides you step by step like a live sous-chef.

  • Wellness Coach – A smart wearable that keeps a real-time eye on your heart rate and talks to you the moment it spikes: “Your heart rate just jumped — let’s do a 30-second calming-breath exercise together.” It becomes your on-the-go guide for instant stress relief and better wellbeing.

Education & Entertainment
  • Child-Friendly Storyteller – A smart toy that doesn’t just read stories but acts them out, changing voices for characters, pausing to answer a child’s questions, and weaving their name into the plot.

  • Interactive Tutor – An IoT classroom buddy that listens to students’ questions in real time and explains concepts instantly, tailoring difficulty to each child’s understanding like a personal teacher.

These examples are just a starting point. Once your IoT device can join a meeting as a participant alongside an AI Agent, you unlock endless two-way use cases.

Demo: IoT SDK in Conversation with AI Agent


Prerequisites

Before you get started, ensure you have the following:

  • Python: Python version >= 3.11
  • Video SDK Developer Account (Not having one, follow Video SDK Dashboard)

Step 1: Setup for AI Agent

On the AI Agent side, configure it to join the meeting as a participant, where it can listen to live audio, process it, and generate responses in real time. Once initialized, the AI Agent becomes the processing core, handling speech recognition, reasoning, and synthesis, and enabling low-latency, two-way sinteraction between the device and the meeting.

def make_context() -> JobContext:
room_options = RoomOptions(
room_id="YOUR_MEETING_ID",
name="VideoSDK Cascaded Agent",
playground=True
)

return JobContext(room_options=room_options)
caution

Make sure you keep the same meetingID in the RoomOptions here, which you’ll also use on the IoT device side later.

See the full setup in the AI Agent Quick Start Guide →

Step 2: Setup for IoT SDK

Now it’s time to configure the IoT SDK on your ESP device. This step enables the device to capture and publish audio streams into the meeting, as well as receive and play responses from the AI Agent. With this setup, your device transitions from being a simple endpoint to an active communication node.

char *token = "Generated - token";
init_config_t init_cfg = {
.meetingID = "meeting-id",
.token = token,
.displayName = "ESP32-Device",
.audioCodec = AUDIO_CODEC_OPUS,
};

result_t init_result = init(&init_cfg);
printf("Result: %d\n", init_result);

caution

Ensure that the meetingID configured on the IoT device is identical to the meetingID defined in the AI Agent’s RoomOptions.

See the full setup in the IoT SDK Quick Start Guide →

Step 3: Fuse IoT + AI into One Experience

This is where the real magic happens.

  1. The AI Agent joins first, ready to process and respond.
if __name__ == "__main__":
# Start the AI Agent, which will join the meeting as a participant
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
  1. Your IoT device follows, sending audio streams and receiving real-time replies.
  // joining the User through IoT - device once AI - Agent joins the meeting.
result_t result_publish = startPublishAudio("your-publisherId");
result_t result_susbcribe = startSubscribeAudio("your-subscriberId");
printf("Result Publish:%d\n", result_publish);
printf("Result Publish:%d\n", result_susbcribe);
  1. Together, they create a seamless two-way interaction, as if your device just became a smart teammate inside the meeting.

Now it’s not just IoT anymore — it’s IoT with a brain, a voice, and instant intelligence.

Got a Question? Ask us on discord