Skip to main content
Version: 1.0.x

AssemblyAI STT

The AssemblyAI STT provider enables your agent to use AssemblyAI's real-time WebSocket API for fast and accurate speech-to-text conversion.

Installation

Install the AssemblyAI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-assemblyai"

Authentication

The AssemblyAI plugin requires an AssemblyAI API key.

Set ASSEMBLYAI_API_KEY in your .env file.

Importing

from videosdk.agents.plugins import AssemblyAISTT

Example Usage

from videosdk.agents.plugins import AssemblyAISTT
from videosdk.agents import Pipeline

# Initialize the AssemblyAI STT model
stt = AssemblyAISTT(
api_key="your-assemblyai-api-key",
language_detection=True
)

# Add stt to pipeline
pipeline = Pipeline(stt=stt)
note

When using a .env file for credentials, don't pass them as arguments to model instances. The SDK automatically reads environment variables, so omit api_key and other credential parameters from your code.

Configuration Options

  • api_key: (str, optional) Your AssemblyAI API key. Uses the ASSEMBLYAI_API_KEY environment variable if not provided. Defaults to None.
  • input_sample_rate: (int) The input sample rate in Hz. Defaults to 48000.
  • target_sample_rate: (int) The target sample rate in Hz that audio is resampled to before sending. Defaults to 16000.
  • format_turns: (bool) Whether to format turns. Defaults to True.
  • keyterms_prompt: (list[str], optional) The word boost list to use for the STT plugin. Defaults to None.
  • end_of_turn_confidence_threshold: (float) The end of turn confidence threshold. Defaults to 0.5.
  • min_end_of_turn_silence_when_confident: (int) The minimum end of turn silence (in ms) when confident. Defaults to 800.
  • max_turn_silence: (int) The maximum turn silence in ms. Defaults to 2000.
  • speech_model: (Literal["universal-streaming-english", "universal-streaming-multilingual"]) The speech recognition model to use. Defaults to "universal-streaming-english".
  • language_detection: (bool) Whether to enable automatic language detection. Defaults to True.
  • region: (str) The region to use for the STT service (e.g., "US", "EU"). Defaults to "US".

Additional Resources

The following resources provide more information about using AssemblyAI with the VideoSDK Agents SDK.

  • AssemblyAI Docs: AssemblyAI's official real-time streaming transcription documentation.

import PluginResourceCards from '@site/src/components/PluginResourceCards'

<PluginResourceCards
sdkReferenceUrl="https://docs.videosdk.live/agent-sdk-reference/plugins-assemblyai/"
githubUrl="https://github.com/videosdk-live/agents/blob/main/videosdk-plugins/videosdk-plugins-assemblyai/videosdk/plugins/assemblyai/stt.py"
/>

Got a Question? Ask us on discord