xAI STT
The xAI STT provider enables your agent to use xAI's speech-to-text models for converting audio input to text.
Installation
Install the xAI-enabled VideoSDK Agents package:
pip install "videosdk-plugins-xai"
Authentication
The xAI plugin requires an xAI API key.
Set XAI_API_KEY in your .env file.
Importing
from videosdk.plugins.xai import XAISTT
Example Usage
from videosdk.plugins.xai import XAISTT
from videosdk.agents import Pipeline
# Initialize the xAI STT model
stt = XAISTT(
# When XAI_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-xai-api-key",
language="en",
sample_rate=48000,
)
# Add stt to pipeline
pipeline = Pipeline(stt=stt)
note
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.
Configuration Options
api_key: Your xAI API key (required, can also be set via environment variable)sample_rate: Audio sample rate in Hz. xAI accepts 8000/16000/22050/24000/44100/48000. Default:48000encoding: Raw audio encoding. One of"pcm"(signed 16-bit LE),"mulaw","alaw". Default:"pcm"interim_results: Emit partial transcripts as they arrive. Default:Trueendpointing: Silence duration (ms) before xAI fires utterance-final. Range 0–5000. Default:50language: BCP-47 language code (e.g."en","fr"). Default:"en"diarize: When true, each word in the response includes a speaker field. Default:Falsemultichannel: When true, transcribes each input channel independently. Default:Falsechannels: Number of input channels (only relevant withmultichannel=True). Default:1base_url: WebSocket endpoint URL. Default:"wss://api.x.ai/v1/stt"
Additional Resources
The following resources provide more information about using xAI (Grok) with the VideoSDK Agents SDK.
- xAI docs: xAI STT API documentation.
Got a Question? Ask us on discord

