Version: 0.0.x

Custom Audio Track | Python

In VideoSDK, you can ingest a custom audio track from a file into your meeting using a CustomAudioTrack.

This example demonstrates how to create a custom audio track from MP4 file and integrate it into a VideoSDK meeting.

Setup

Before running the example, ensure you have set up your environment variables for VideoSDK authentication:

VIDEOSDK_TOKEN = "<VIDEOSDK_TOKEN>"
MEETING_ID = "<MEETING_ID>"
NAME = "<NAME>"

Custom Audio Track Class

The AudioFileTrack class extends CustomAudioTrack to handle audio stream ingestion from an MP4 file:

import time
import fractions
import asyncio
from av import AudioFrame, open, AudioResampler
from vsaiortc.mediastreams import MediaStreamError
from videosdk import CustomAudioTrack, MeetingConfig, VideoSDK, Participant, MeetingEventHandler, ParticipantEventHandler

VIDEOSDK_TOKEN = "VIDEOSDK_TOKEN"
MEETING_ID = "MEETING_ID"
NAME = "NAME"
loop = asyncio.get_event_loop()

class AudioFileTrack(CustomAudioTrack):
    def __init__(self, file_path: str):
        super().__init__()
        self.file_path = file_path
        self.container = open(file_path)
        self.stream = self.container.streams.audio[0]
        self.decoder = self.container.decode(self.stream)
        self._start = None
        self._timestamp = 0
        self.resampler = AudioResampler(format='s16', layout='mono', rate=90000)
        self.frame_buffer = []

    async def recv(self) -> AudioFrame:
        """
        Receive the next :class:`~av.audio.frame.AudioFrame` from the MP4 file.
        """
        # implementing in next step

`AudioFileTrack.recv()`

Purpose:

The recv method in AudioFileTrack class is responsible for retrieving the next audio frame from the source(file, url, Buffer or any).

Key Components:

Frame Buffering: Frames are read from the file and stored in self.frame_buffer after resampling them to match the desired audio format ('s16' format, mono channel, specified sample rate).
Timestamp Synchronization: It ensures that audio frames are delivered at the correct timestamps to maintain audio-video synchronization.
Asynchronous Waiting: Uses asyncio.sleep() to synchronize the frame delivery based on calculated wait time (wait).

Code Snippet:

    async def recv(self) -> AudioFrame:
        """
        Receive the next :class:~av.audio.frame.AudioFrame from the MP4 file.
        """
        if self.readyState != "live":
            raise MediaStreamError

        if self._start is None:
            self._start = time.time()

        # Frame Buffering
        if not self.frame_buffer:
            try:
                frame = next(self.decoder)
                frames = self.resampler.resample(frame)
                self.frame_buffer.extend(frames)
            except StopIteration:
                raise MediaStreamError("End of stream")

        frame = self.frame_buffer.pop(0)

        # Ensure the audio frame has the correct format
        # Calculate the wait time to synchronize playback
        sample_rate = 90000
        self._timestamp += frame.samples
        wait = self._start + (self._timestamp / sample_rate) - time.time()
        if wait > 0:
            await asyncio.sleep(wait)

        frame.pts = self._timestamp
        frame.time_base = fractions.Fraction(1, sample_rate)

        return frame

Event Handlers

Define custom event handlers to handle meeting and participant events:

class MyMeetingEventHandler(MeetingEventHandler):
    def on_meeting_joined(self, data):
        print("Meeting joined")

    def on_meeting_left(self, data):
        print("Meeting left")


class MyParticipantEventHandler(ParticipantEventHandler):
    def __init__(self, participant_id: str):
        super().__init__()
        self.participant_id = participant_id

    def on_stream_enabled(self, stream):
        print(f"Participant {self.participant_id} stream enabled: {stream.kind}")

    def on_stream_disabled(self, stream):
        print(f"Participant {self.participant_id} stream disabled: {stream.kind}")

You can learn more about MeetinEventHandler & ParticipantEventHandler class in upcoming guide.

Main Function

Configure and run the main function to initialize the meeting with the custom audio track:

def main():
    # Initialize custom audio track
    audio_track = AudioFileTrack("./example-video.mp4")

    # Configure meeting
    meeting_config = MeetingConfig(
        meeting_id=MEETING_ID,
        name=NAME,
        mic_enabled=True,
        webcam_enabled=False,
        token=VIDEOSDK_TOKEN,
        custom_microphone_audio_track=audio_track
    )

    # Initialize VideoSDK meeting
    meeting = VideoSDK.init_meeting(**meeting_config)

    # Add event listeners
    meeting.add_event_listener(MyMeetingEventHandler())
    meeting.local_participant.add_event_listener(MyParticipantEventHandler(participant_id=meeting.local_participant.id))

    # Join the meeting
    print("Joining the meeting...")
    meeting.join()

    print("Meeting setup complete")

if __name__ == '__main__':
    main()
    loop.run_forever()

tip

Stuck anywhere? Check out this example code on GitHub

Conclusion

This setup demonstrates how to integrate a custom audio track from file into a VideoSDK meeting, with event handling for meeting and participant interactions.

API Reference

Refer to the following API documentation for methods and events used in this guide:

Got a Question? Ask us on discord

Setup​

Custom Audio Track Class​

AudioFileTrack.recv()​

Purpose:​

Key Components:​

Code Snippet:​

Event Handlers​

Main Function​

Conclusion​

API Reference​