Preemptive Response

Preemptive Response is a feature that allows the Speech-to-Text (STT) engine to produce partial, low-latency text output while the user is still speaking. This is crucial for building highly responsive conversational AI agents.

By enabling preemptive response, your agent can begin processing the user's intent and formulating a response before the full utterance is completed, significantly reducing the perceived latency.

How It Works

preemtive-response

User audio is streamed to the STT, which generates partial transcripts.
These partial transcripts are immediately sent to the LLM to enable preemptive (early) responses.
The LLM output is then passed to the TTS to generate the spoken response.

Prerequisites

Ensure you have the required packages installed:

pip install "videosdk-agents[deepgram,openai,elevenlabs,silero,turn_detector]"

tip

Currently, preemptive response generation is limited to Deepgram’s STT implementation and is available only in the Flux model.

Enabling Preemptive Generation

To enable this feature, set the enable_preemptive_generation flag to True when initializing your STT plugin (e.g., DeepgramSTTV2).

from videosdk.plugins.deepgram import DeepgramSTTV2

stt = DeepgramSTTV2(
    enable_preemptive_generation=True
)

Full Working Example

The following example demonstrates how to build a voice agent with preemptive transcription enabled. This setup uses Deepgram for STT, OpenAI for LLM, and ElevenLabs for TTS.

import asyncio
import os
from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
from videosdk.plugins.deepgram import DeepgramSTTV2
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS

# Pre-download the Turn Detector model to avoid delays during startup
pre_download_model()

class MyVoiceAgent(Agent):
    def __init__(self):
        super().__init__(instructions="You are a helpful voice assistant that can answer questions and help with tasks.")

    async def on_enter(self):
        await self.session.say("Hello! How can I help you today?")

    async def on_exit(self):
        await self.session.say("Goodbye!")

async def start_session(context: JobContext):
    # 1. Create the agent and conversation flow
    agent = MyVoiceAgent()
    conversation_flow = ConversationFlow(agent)

    # 2. Define the pipeline with Preemptive Generation enabled
    pipeline = CascadingPipeline(
        stt=DeepgramSTTV2(
            model="flux-general-en",
            enable_preemptive_generation=True  # Enable low-latency partials
        ),
        llm=OpenAILLM(model="gpt-4o"),
        tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
        vad=SileroVAD(threshold=0.35),
        turn_detector=TurnDetector(threshold=0.8)
    )

    # 3. Initialize the session
    session = AgentSession(
        agent=agent,
        pipeline=pipeline,
        conversation_flow=conversation_flow
    )

    try:
        await context.connect()
        await session.start()
        # Keep the session running
        await asyncio.Event().wait()
    finally:
        # Clean up resources
        await session.close()
        await context.shutdown()

def make_context() -> JobContext:
    room_options = RoomOptions(
        name="VideoSDK Cascaded Agent",
        playground=True
    )
    return JobContext(room_options=room_options)

if __name__ == "__main__":
    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
    job.start()

Got a Question? Ask us on discord

How It Works​

Prerequisites​

Enabling Preemptive Generation​

Full Working Example​

How It Works

Prerequisites

Enabling Preemptive Generation

Full Working Example