Inference Pricing

VideoSDK provides access to state-of-the-art AI models for building intelligent voice and video applications. Our inference API supports multiple providers including Google and Sarvam AI.

Contact sales for preferred rates on high-volume usage.

Speech-to-Text (STT)

Transcribe audio to text with high accuracy and low latency.

Model	Provider	Price (per minute)	Reference
Google STT Chirp 2	Google	$0.01600	Pricing↗
Saarika v2.5	Sarvam	$0.00560	Pricing↗
Nova 2	Deepgram	$0.0058	Pricing↗

Large Language Models (LLM)

Build conversational AI agents with the latest language models from Google.

Model	Provider	Input (per 1M tokens)	Output (per 1M tokens)	Reference
Gemini 2.5 Flash	Google	$0.30	$2.50	Pricing↗
Gemini 2.5 Flash Lite	Google	$0.10	$0.40	Pricing↗
Gemini 2.0 Flash	Google	$0.10	$0.40	Pricing↗
Gemini 2.0 Flash Lite	Google	$0.075	$0.30	Pricing↗

Text-to-Speech (TTS)

Convert text to natural-sounding speech with multiple quality tiers.

Model	Provider	Price (per 1M chars)	Reference
Google TTS Chirp 3	Google	$30.00	Pricing↗
Bulbul v2	Sarvam	$17.00	Pricing↗
Aura 2	Deepgram	$30.00	Pricing↗

Speech-to-Speech Models

Enable real-time voice conversations with native audio processing.

Model	Provider	Input (per 1M tokens)	Output (per 1M tokens)	Reference
Gemini Live 2.5 Flash Native Audio	Google	$3.00	$12.00	Pricing↗

Billing & Usage

How Inference Pricing Works

Pay-as-you-go (On-demand): Inference services (STT, LLM, TTS, STS) are only enabled for PAYG on-demand users. Upgrade to PAYG to use inference services.
Concurrency Allotment: Models are allotted based on the concurrency availability for a particular model on our server at the time you run the agent.
No minimum commitment: Start small and scale as needed
Usage-based billing: Charges are calculated based on actual API usage
Transparent Usage Charges: You can track your usage through the VideoSDK Dashboard

Usage Calculation

LLM & Speech-to-Speech Models

Input tokens: Charged per 1 million input tokens processed
Output tokens: Charged per 1 million output tokens generated
Token counts are calculated based on the model's tokenizer

Text-to-Speech Models

Characters: Charged per 1 million characters converted to speech
All characters in the input text are counted, including spaces and punctuation

Speech-to-Text Models

Audio duration: Charged per minute of audio transcribed
Partial minutes are rounded up to the nearest minute

Example Calculations

LLM Usage Example

If you process 5 million input tokens and generate 2 million output tokens using Gemini 2.0 Flash:

Input cost: 5 × $0.10 = $0.50
Output cost: 2 × $0.40 = $0.80
Total cost: $1.30

TTS Usage Example

If you convert 10 million characters to speech using Google Cloud TTS Standard:

Cost: 10 × $4.00 = $40.00
Total cost: $40.00

STT Usage Example

If you transcribe 120 minutes of audio using Google STT Chirp 2:

Cost: 120 × $0.01200 = $1.44
Total cost: $1.44

Frequently Asked Questions

1. Are there any free tiers for inference APIs?

Currently, inference APIs are billed on a pay-as-you-go basis without a free tier. However, we offer competitive pricing and volume discounts for high-usage customers.

2. Can I use multiple models in my application?

Yes, you can use any combination of models from our supported providers. Each model will be billed separately based on its usage.

3. How do I monitor my inference usage?

You can track your inference API usage through the VideoSDK Dashboard.

4. What happens if I exhaust my balance?

You will get usage alerts and can set auto-recharge in the VideoSDK Dashboard to prevent service outages.

5. Are there volume discounts available?

Yes! If you expect high volume usage, please contact our sales team.

6. Which providers are supported?

We currently support Google (Gemini, Google Cloud TTS/STT) and Sarvam AI. We're continuously adding new providers and models based on customer demand.

7. How are partial units billed?

Tokens: Charged per actual token count
Characters: Charged per actual character count
Audio minutes: Charged per actual audio minute

8. Can I switch between models?

Yes, you can switch between different models at any time, and you'll be charged for each model's actual usage.

Got a Question? Ask us on discord

Speech-to-Text (STT)​

Large Language Models (LLM)​

Text-to-Speech (TTS)​

Speech-to-Speech Models​

Billing & Usage​

How Inference Pricing Works​

Usage Calculation​

LLM & Speech-to-Speech Models​

Text-to-Speech Models​

Speech-to-Text Models​

Example Calculations​

LLM Usage Example​

TTS Usage Example​

STT Usage Example​

Frequently Asked Questions​

1. Are there any free tiers for inference APIs?​

2. Can I use multiple models in my application?​

3. How do I monitor my inference usage?​

4. What happens if I exhaust my balance?​

5. Are there volume discounts available?​

6. Which providers are supported?​

7. How are partial units billed?​

8. Can I switch between models?​