This represents the latency metrics for the current turn.
Optionalproviders?: objectThis represents the providers used (e.g., TTS, STT, LLMs). This is available only in the first turn.
This represents the user and agent speech for the current turn.
OptionalsystemInstructions?: objectThis represents the system instructions configured for the agent. This is available only in the first turn.
Triggered when the state of the agentParticipant changes.
This represents the current state of the agentParticipant.
Possible values: AgentState
Triggered when a transcription is received from a participant or agent.
This represents the participant whose speech is transcribed. It can be either an AgentParticipant or a regular Participant, depending on who is speaking.
This represents the transcription segment.
This represents the transcribed text.
This represents the timestamp of the transcription.
Optionaltype?: stringThis represents the type of the transcription segment (e.g., "final", "intrim").
Triggered when the media status of a participant changes (for example, when audio or video is enabled or disabled).
Type of stream whose status changed.
The updated status of the stream.
Triggered when a participant’s audio, video, or screen-share Stream is disabled.
Triggered when a participant’s audio, video, or screen-share Stream is enabled.
Triggered after each conversational turn with metrics and speech data.