Skip to main content

CambAI TTS

The CambAI TTS provider enables your agent to use CambAI's high-quality, low-latency text-to-speech models for generating natural-sounding voice output with advanced voice customization capabilities.

Installation

Install the CambAI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-cambai"

Importing

from videosdk.plugins.cambai import CambAITTS, InferenceOptions, VoiceSettings, OutputConfiguration

Authentication

The CambAI plugin requires a CambAI API key.

Set CAMBAI_API_KEY in your .env file.

Example Usage

from videosdk.plugins.cambai import CambAITTS, InferenceOptions, VoiceSettings, OutputConfiguration
from videosdk.agents import CascadingPipeline

inference_options = InferenceOptions(
stability=0.5,
temperature=0.7,
inference_steps=60,
speaker_similarity=0.8,
localize_speaker_weight=0.5,
acoustic_quality_boost=True
)

# Configure voice settings
voice_settings = VoiceSettings(
enhance_reference_audio_quality=False,
maintain_source_accent=False,
)

output_configuration = OutputConfiguration(
format="wav",
sample_rate=24000, # Audio sample rate
duration=None
)

# Initialize CambAI TTS with optional audio output settings
tts = CambAITTS(
speech_model="mars-pro",
voice_id=147320,
language="en-us",
user_instructions=None, # Optional for mars-instruct
enhance_named_entities_pronunciation=True,
voice_settings=voice_settings,
inference_options=inference_options,
output_configuration=output_configuration,
)

# Add TTS to a cascading pipeline
pipeline = CascadingPipeline(tts=tts)
note

When using .env file for credentials, don't pass them as arguments to model instances. The SDK automatically reads environment variables, so omit api_key and other credential parameters from your code.

Configuration Options

  • api_key: (str) Your CambAI API key. Can also be set via the CAMBAI_API_KEY environment variable.
  • speech_model: (str) The CambAI TTS model to use (e.g., "mars-pro", "mars-flash", "mars-instruct"). Defaults to "mars-pro".
  • voice_id: (int) Numeric voice profile ID from CambAI's voice library. Defaults to 147320.
  • language: (str) BCP-47 locale string (e.g., "en-us"). Defaults to "en-us".
  • user_instructions: (str) Style and tone guidance for the generated speech. Only supported when speech_model is set to "mars-instruct".
  • enhance_named_entities_pronunciation: (bool) Improve pronunciation of brand names and proper nouns (default: False).
  • voice_settings: (VoiceSettings) Voice behaviour preferences:
    • enhance_reference_audio_quality: (bool) Enhance the quality of reference audio (default: False)
    • maintain_source_accent: (bool) Preserve the original speaker's accent (default: False)
  • inference_options: (InferenceOptions) Model sampling controls:
    • stability: (float) Voice stability control (optional)
    • temperature: (float) Sampling temperature (optional)
    • inference_steps: (int) Number of inference steps (optional)
    • speaker_similarity: (float) Speaker similarity control (optional)
    • localize_speaker_weight: (float) Speaker localization weight (optional)
    • acoustic_quality_boost: (bool) Enable acoustic quality enhancement (optional)
  • output_configuration: (OutputConfiguration) Audio output format and pacing options:
    • format: (str) Output audio format. Currently "wav" is supported (default: "wav")
    • sample_rate: (int) Audio sample rate in Hz (default: 24000)
    • duration: (float) Target speech duration in seconds. When set, the model attempts to pace the audio to match the requested duration. Omit or set to None for natural pacing (optional)

Additional Resources

The following resources provide more information about using CambAI with VideoSDK Agents.

Got a Question? Ask us on discord