Skip to main content
Version: 1.0.x

xAI (Grok) TTS

The xAI (Grok) TTS provider enables your agent to use xAI's text-to-speech models for generating voice output.

Installation

Install the xAI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-xai"

Importing

from videosdk.plugins.xai import XAITTS

Authentication

The xAI plugin requires an xAI API key.

Set XAI_API_KEY in your .env file.

Example Usage

from videosdk.plugins.xai import XAITTS
from videosdk.agents import Pipeline

# Initialize the xAI TTS model
tts = XAITTS(
# When XAI_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-xai-api-key",
voice="eve",
language="en",
codec="pcm",
sample_rate=24000,
optimize_streaming_latency=0,
text_normalization=False,
)

# Add tts to pipeline
pipeline = Pipeline(tts=tts)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key and other credential parameters from your code.

Configuration Options

  • api_key: (str) Your xAI API key. Can also be set via the XAI_API_KEY environment variable.
  • voice: (str) The voice to use. One of "eve", "ara", "rex", "sal", or "leo". Case-insensitive (default: "eve").
  • language: (str) BCP-47 language code (e.g. "en", "hi", "pt-BR") or "auto" for automatic language detection (default: "en").
  • codec: (str) Output audio codec. Allowed values: "pcm" (signed 16-bit LE) or "mulaw" (default: "pcm").
  • sample_rate: (int) Output sample rate in Hz. One of 8000, 16000, 22050, 24000, 44100, or 48000 (default: 24000).
  • optimize_streaming_latency: (int) 0 for best quality (default) or 1 for lower time-to-first-audio with a minor quality tradeoff.
  • text_normalization: (bool) When True, xAI normalizes written-form text (numbers, abbreviations, symbols) into spoken-form before synthesis (default: False).
  • base_url: (str) WebSocket endpoint URL for the xAI TTS API (default: "wss://api.x.ai/v1/tts").
  • max_connection_age_sec: (float) Maximum WebSocket connection age in seconds before reconnecting (default: 300.0).

Additional Resources

The following resources provide more information about using xAI (Grok) with VideoSDK Agents SDK.

Got a Question? Ask us on discord