Azure OpenAI TTS
The Azure OpenAI TTS provider enables your agent to use Azure OpenAI's text-to-speech models for converting text responses to natural-sounding audio output.
Installation
Install the Azure OpenAI-enabled VideoSDK Agents package:
pip install "videosdk-plugins-openai"
Importing
from videosdk.agents.plugins import OpenAITTS
Authentication
The Azure OpenAI plugin requires either an Azure OpenAI API key.
Set AZURE_OPENAI_API_KEY , AZURE_OPENAI_ENDPOINT and OPENAI_API_VERSION in your .env file.
Example Usage
from videosdk.agents.plugins import OpenAITTS
from videosdk.agents import Pipeline
# Initialize the Azure OpenAI TTS model
tts = OpenAITTS.azure(
azure_deployment="gpt-4o-mini-tts",
speed=1.0,
response_format="pcm"
)
# Add tts to cascade
pipeline = Pipeline(tts=tts)
note
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.
Configuration Options
model: (str) The model to use for the TTS plugin (default:"gpt-4o-mini-tts")azure_deployment: The OpenAI deployment ID to use (by default it is model name: e.g.,"gpt-4o-mini-tts"). Can also be set via theAZURE_OPENAI_DEPLOYMENTenvironment variable.api_key: Your Azure OpenAI API key. Can also be set via theAZURE_OPENAI_API_KEYenvironment variable.azure_endpoint: Your Azure OpenAI Deployment Endpoint URL. Can also be set via theAZURE_OPENAI_ENDPOINTenvironment variable.api_version: Your Azure OpenAI API version. Can also be set via theOPENAI_API_VERSIONenvironment variable.azure_ad_token: (str) Azure AD token for token-based authentication. Can also be set via theAZURE_OPENAI_AD_TOKENenvironment variable.organization: (str) OpenAI organization ID. Can also be set via theOPENAI_ORG_IDenvironment variable.project: (str) OpenAI project ID. Can also be set via theOPENAI_PROJECT_IDenvironment variable.voice: (str) Voice to use for audio output (e.g.,"alloy","echo","fable","onyx","nova","shimmer")speed: (float) Speed of the generated audio (default: 1.0)instructions: (Optional[str]) Natural-language style control. Only honored bygpt-4o-mini-tts(default:None).language: (Optional[str]) ISO language hint (e.g."hi","mr","fr") (default:None).base_url: (Optional[str]) Custom base URL for the API (default:None).response_format: (str) The response format to use for the TTS plugin (default:"pcm").chunked_synthesis: (bool) WhenTrue, dispatch one request perFlushMarkerboundary. WhenFalse, accumulate the entire stream into a single request (default:False).
Additional Resources
The following resources provide more information about using OpenAI with VideoSDK Agents SDK.
Got a Question? Ask us on discord

