Skip to main content
Version: 1.0.x

Inworld AI TTS

The Inworld AI TTS provider enables your agent to use Inworld AI's high-quality text-to-speech models for generating natural-sounding voice output.

Installation

Install the Inworld AI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-inworldai"

Importing

from videosdk.agents.plugins import InworldAITTS

Authentication

The Inworld plugin requires an Inworld API key.

Set INWORLD_API_KEY in your .env file.

Example Usage

from videosdk.agents.plugins import InworldAITTS
from videosdk.agents import Pipeline

# Initialize the Inworld AI TTS model
tts = InworldAITTS(
api_key="your-api-key",
voice_id="Hades",
model_id="inworld-tts-1"
)

# Add tts to pipeline
pipeline = Pipeline(tts=tts)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key and other credential parameters from your code.

Configuration Options

  • model_id: (str) Inworld TTS model identifier (default: "inworld-tts-1.5-max").
  • voice_id: (str) Voice identifier to use (default: "Sarah").
  • temperature: (float) Sampling temperature for variation in prosody, 0.0–2.0 (default: 0.8).
  • sample_rate: (int) Output sample rate in Hz (default: 24000).
  • enable_streaming: (bool) When True, uses WebSocket bidirectional streaming; when False, uses HTTP streaming POST (default: True).
  • auto_mode: (bool) WebSocket only. When True, the server controls buffer flushing for minimal latency (default: True).
  • max_buffer_delay_ms: (int) WebSocket only. Server-side max wait time before flushing accumulated text. None = unbounded (default: None).
  • buffer_char_threshold: (int) WebSocket only. Server-side character count that auto-triggers flushing. Cannot exceed 1000 (default: None).
  • apply_text_normalization: (str) "ON", "OFF", or None (server decides). When on, normalizes text such as Dr. SmithDoctor Smith (default: None).
  • speaking_rate: (float) Speed multiplier in the range [0.5, 1.5]. None uses the voice's natural rate (default: None).
  • max_connection_age_sec: (float) Refresh the WebSocket after this many seconds to avoid hitting idle/session limits (default: 300.0).
  • api_key: (str) Inworld API key. Can also be set via the INWORLD_API_KEY environment variable.

Additional Resources

The following resources provide more information about using Inworld with VideoSDK Agents SDK.

Got a Question? Ask us on discord