Skip to main content
Version: 1.0.x

Azure OpenAI TTS

The Azure OpenAI TTS provider enables your agent to use Azure OpenAI's text-to-speech models for converting text responses to natural-sounding audio output.

Installation

Install the Azure OpenAI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-openai"

Importing

from videosdk.agents.plugins import OpenAITTS

Authentication

The Azure OpenAI plugin requires either an Azure OpenAI API key.

Set AZURE_OPENAI_API_KEY , AZURE_OPENAI_ENDPOINT and OPENAI_API_VERSION in your .env file.

Example Usage

from videosdk.agents.plugins import OpenAITTS
from videosdk.agents import Pipeline

# Initialize the Azure OpenAI TTS model
tts = OpenAITTS.azure(
azure_deployment="gpt-4o-mini-tts",
speed=1.0,
response_format="pcm"
)

# Add tts to cascade
pipeline = Pipeline(tts=tts)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Configuration Options

  • model: (str) The model to use for the TTS plugin (default: "gpt-4o-mini-tts")
  • azure_deployment: The OpenAI deployment ID to use (by default it is model name: e.g., "gpt-4o-mini-tts"). Can also be set via the AZURE_OPENAI_DEPLOYMENT environment variable.
  • api_key: Your Azure OpenAI API key. Can also be set via the AZURE_OPENAI_API_KEY environment variable.
  • azure_endpoint: Your Azure OpenAI Deployment Endpoint URL. Can also be set via the AZURE_OPENAI_ENDPOINT environment variable.
  • api_version: Your Azure OpenAI API version. Can also be set via the OPENAI_API_VERSION environment variable.
  • azure_ad_token: (str) Azure AD token for token-based authentication. Can also be set via the AZURE_OPENAI_AD_TOKEN environment variable.
  • organization: (str) OpenAI organization ID. Can also be set via the OPENAI_ORG_ID environment variable.
  • project: (str) OpenAI project ID. Can also be set via the OPENAI_PROJECT_ID environment variable.
  • voice: (str) Voice to use for audio output (e.g., "alloy", "echo", "fable", "onyx", "nova", "shimmer")
  • speed: (float) Speed of the generated audio (default: 1.0)
  • instructions: (Optional[str]) Natural-language style control. Only honored by gpt-4o-mini-tts (default: None).
  • language: (Optional[str]) ISO language hint (e.g. "hi", "mr", "fr") (default: None).
  • base_url: (Optional[str]) Custom base URL for the API (default: None).
  • response_format: (str) The response format to use for the TTS plugin (default: "pcm").
  • chunked_synthesis: (bool) When True, dispatch one request per FlushMarker boundary. When False, accumulate the entire stream into a single request (default: False).

Additional Resources

The following resources provide more information about using OpenAI with VideoSDK Agents SDK.

Got a Question? Ask us on discord