Skip to main content

Deepgram STT

The Deepgram STT provider enables your agent to use Deepgram's advanced speech-to-text models for high-accuracy, real-time audio transcription.

Installation​

Install the Deepgram-enabled VideoSDK Agents package:

pip install "videosdk-plugins-deepgram"

Importing​

from videosdk.plugins.deepgram import DeepgramSTT

Example Usage​

from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.agents import CascadingPipeline

# Initialize the Deepgram STT model
stt = DeepgramSTT(
# When DEEPGRAM_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-deepgram-api-key",
model="nova-2",
language="en-US",
interim_results=True,
punctuate=True,
smart_format=True
)

# Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Configuration Options​

  • api_key: Your Deepgram API key (can also be set via environment variable)
  • model: The Deepgram model to use (e.g., "nova-2", "nova-3", "whisper-large")
  • language: (str) Language code for transcription (default: "en-US")
  • interim_results: (bool) Enable real-time partial transcription results (default: True)
  • punctuate: (bool) Add punctuation to transcription (default: True)
  • smart_format: (bool) Apply intelligent formatting to output (default: True)
  • sample_rate: (int) Audio sample rate in Hz (default: 48000)
  • endpointing: (int) Silence detection threshold in milliseconds (default: 50)
  • filler_words: (bool) Include filler words like "uh", "um" in transcription (default: True)
  • base_url: (str) WebSocket endpoint URL (default: "wss://api.deepgram.com/v1/listen")

Got a Question? Ask us on discord