Skip to main content
Version: 1.0.x

xAI STT

The xAI STT provider enables your agent to use xAI's speech-to-text models for converting audio input to text.

Installation

Install the xAI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-xai"

Authentication

The xAI plugin requires an xAI API key.

Set XAI_API_KEY in your .env file.

Importing

from videosdk.plugins.xai import XAISTT

Example Usage

from videosdk.plugins.xai import XAISTT
from videosdk.agents import Pipeline

# Initialize the xAI STT model
stt = XAISTT(
# When XAI_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-xai-api-key",
language="en",
sample_rate=48000,
)

# Add stt to pipeline
pipeline = Pipeline(stt=stt)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Configuration Options

  • api_key: Your xAI API key (required, can also be set via environment variable)
  • sample_rate: Audio sample rate in Hz. xAI accepts 8000/16000/22050/24000/44100/48000. Default: 48000
  • encoding: Raw audio encoding. One of "pcm" (signed 16-bit LE), "mulaw", "alaw". Default: "pcm"
  • interim_results: Emit partial transcripts as they arrive. Default: True
  • endpointing: Silence duration (ms) before xAI fires utterance-final. Range 0–5000. Default: 50
  • language: BCP-47 language code (e.g. "en", "fr"). Default: "en"
  • diarize: When true, each word in the response includes a speaker field. Default: False
  • multichannel: When true, transcribes each input channel independently. Default: False
  • channels: Number of input channels (only relevant with multichannel=True). Default: 1
  • base_url: WebSocket endpoint URL. Default: "wss://api.x.ai/v1/stt"

Additional Resources

The following resources provide more information about using xAI (Grok) with the VideoSDK Agents SDK.

Got a Question? Ask us on discord