OpenAI Realtime

Overview

OpenAIRealtimeLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with minimal latency response times.

OpenAI Realtime API Reference

Pipecat’s API methods for OpenAI Realtime integration

Example Implementation

Complete OpenAI Realtime conversation example

OpenAI Documentation

Official OpenAI Realtime API documentation

OpenAI Platform

Access Realtime models and manage API keys

Installation

To use OpenAI Realtime services, install the required dependencies:

pip install "pipecat-ai[openai]"

Prerequisites

OpenAI Account Setup

Before using OpenAI Realtime services, you need:

OpenAI Account: Sign up at OpenAI Platform
API Key: Generate an OpenAI API key from your account dashboard
Model Access: Ensure access to GPT-4o Realtime models
Usage Limits: Configure appropriate usage limits and billing

Required Environment Variables

OPENAI_API_KEY: Your OpenAI API key for authentication

Key Features

Real-time Speech-to-Speech: Direct audio processing with minimal latency
Advanced Turn Detection: Multiple voice activity detection options including semantic detection
Function Calling: Seamless support for external functions and APIs
Voice Options: Multiple voice personalities and speaking styles
Conversation Management: Intelligent context handling and conversation flow control

Configuration

OpenAIRealtimeLLMService

api_key

str

required

OpenAI API key for authentication.

model

str

default:"gpt-realtime-1.5"

OpenAI Realtime model name. This is a connection-level parameter set via the WebSocket URL and cannot be changed during the session.Deprecated: Pass via settings=OpenAIRealtimeLLMSettings(model="...") instead.

base_url

str

default:"wss://api.openai.com/v1/realtime"

WebSocket base URL for the Realtime API. Override for custom or proxied deployments.

session_properties

SessionProperties

default:"None"

Configuration properties for the realtime session. These are session-level settings that can be updated during the session (except for voice and model). See SessionProperties below.Deprecated: Use settings=OpenAIRealtimeLLMSettings(session_properties=...) instead.

settings

OpenAIRealtimeLLMSettings

default:"None"

Runtime-updatable settings for this service. Preferred method for configuring the service. See OpenAIRealtimeLLMSettings below.

start_audio_paused

bool

default:"False"

Whether to start with audio input paused. Useful when you want to control when audio processing begins.

start_video_paused

bool

default:"False"

Whether to start with video input paused.

video_frame_detail

str

default:"auto"

Detail level for video processing. Can be "auto", "low", or "high". "auto" lets the model decide, "low" is faster and uses fewer tokens, "high" provides more detail.

OpenAIRealtimeLLMSettings

Runtime-updatable settings for OpenAI Realtime. All fields from SessionProperties can be passed here, and will be automatically routed to the session configuration.

model

str

default:"None"

Model to use. Syncs bidirectionally with session_properties.model.

system_instruction

str

default:"None"

System instructions for the assistant. Syncs bidirectionally with session_properties.instructions.

session_properties

SessionProperties

default:"None"

OpenAI Realtime session properties (modalities, audio config, tools, etc.). model and instructions fields are synced with the top-level model and system_instruction fields. Top-level values take precedence when both are set.

SessionProperties

Session-level configuration passed via the session_properties constructor argument. These settings can be updated during the session using LLMUpdateSettingsFrame.

Parameter	Type	Default	Description
`output_modalities`	`List[Literal["text", "audio"]]`	`None`	Modalities the model can respond with. The API supports single modality responses: either `["text"]` or `["audio"]`.
`instructions`	`str`	`None`	System instructions for the assistant.
`audio`	`AudioConfiguration`	`None`	Configuration for input and output audio (format, transcription, turn detection, voice, speed).
`tools`	`List[Dict]`	`None`	Available function tools for the assistant.
`tool_choice`	`Literal["auto", "none", "required"]`	`None`	Tool usage strategy.
`max_output_tokens`	`int \| Literal["inf"]`	`None`	Maximum tokens in response, or `"inf"` for unlimited.
`tracing`	`Literal["auto"] \| Dict`	`None`	Configuration options for tracing.

AudioConfiguration

The audio field in SessionProperties accepts an AudioConfiguration with input and output sub-configurations: AudioInput (audio.input):

Parameter	Type	Default	Description
`format`	`AudioFormat`	`None`	Input audio format (`PCMAudioFormat`, `PCMUAudioFormat`, or `PCMAAudioFormat`).
`transcription`	`InputAudioTranscription`	`None`	Transcription settings: `model` (e.g. `"gpt-4o-transcribe"`), `language`, and `prompt`.
`noise_reduction`	`InputAudioNoiseReduction`	`None`	Noise reduction type: `"near_field"` or `"far_field"`.
`turn_detection`	`TurnDetection \| SemanticTurnDetection \| bool`	`None`	Turn detection config, or `False` to disable server-side turn detection.

AudioOutput (audio.output):

Parameter	Type	Default	Description
`format`	`AudioFormat`	`None`	Output audio format.
`voice`	`str`	`None`	Voice the model uses to respond (e.g. `"alloy"`, `"echo"`, `"shimmer"`).
`speed`	`float`	`None`	Speed of the model’s spoken response.

TurnDetection

Server-side VAD configuration via TurnDetection:

Parameter	Type	Default	Description
`type`	`Literal["server_vad"]`	`"server_vad"`	Detection type.
`threshold`	`float`	`0.5`	Voice activity detection threshold (0.0-1.0).
`prefix_padding_ms`	`int`	`300`	Padding before speech starts in milliseconds.
`silence_duration_ms`	`int`	`500`	Silence duration to detect speech end in milliseconds.

Alternatively, use SemanticTurnDetection for semantic-based detection:

Parameter	Type	Default	Description
`type`	`Literal["semantic_vad"]`	`"semantic_vad"`	Detection type.
`eagerness`	`Literal["low", "medium", "high", "auto"]`	`None`	Turn detection eagerness level.
`create_response`	`bool`	`None`	Whether to automatically create responses on turn detection.
`interrupt_response`	`bool`	`None`	Whether to interrupt ongoing responses on turn detection.

Usage

Basic Setup

import os
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService

llm = OpenAIRealtimeLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
)

With Session Configuration

from pipecat.services.openai.realtime.llm import (
    OpenAIRealtimeLLMService,
    OpenAIRealtimeLLMSettings,
)
from pipecat.services.openai.realtime.events import (
    SessionProperties,
    AudioConfiguration,
    AudioInput,
    AudioOutput,
    InputAudioTranscription,
    SemanticTurnDetection,
)

llm = OpenAIRealtimeLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIRealtimeLLMSettings(
        system_instruction="You are a helpful assistant.",
        session_properties=SessionProperties(
            audio=AudioConfiguration(
                input=AudioInput(
                    transcription=InputAudioTranscription(model="gpt-4o-transcribe"),
                    turn_detection=SemanticTurnDetection(eagerness="medium"),
                ),
                output=AudioOutput(
                    voice="alloy",
                    speed=1.0,
                ),
            ),
            max_output_tokens=4096,
        ),
    ),
)

With Disabled Turn Detection (Manual Control)

from pipecat.services.openai.realtime.llm import (
    OpenAIRealtimeLLMService,
    OpenAIRealtimeLLMSettings,
)
from pipecat.services.openai.realtime.events import (
    SessionProperties,
    AudioConfiguration,
    AudioInput,
)

llm = OpenAIRealtimeLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIRealtimeLLMSettings(
        session_properties=SessionProperties(
            audio=AudioConfiguration(
                input=AudioInput(
                    turn_detection=False,
                ),
            ),
        ),
    ),
)

Updating Settings at Runtime

Settings can be updated during a session using LLMUpdateSettingsFrame. You can pass SessionProperties fields directly in the settings dict, and they will be automatically routed to session_properties:

from pipecat.frames.frames import LLMUpdateSettingsFrame

# Update session properties - keys are automatically routed to session_properties
await task.queue_frame(
    LLMUpdateSettingsFrame(
        settings={
            "instructions": "Now speak in Spanish.",
            "max_output_tokens": 2048,
        }
    )
)

Alternatively, you can use LLMUpdateSettingsFrame with a delta parameter for type-safe updates:

from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMSettings
from pipecat.services.openai.realtime.events import SessionProperties

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=OpenAIRealtimeLLMSettings(
            system_instruction="Now speak in Spanish.",
            session_properties=SessionProperties(
                max_output_tokens=2048,
            ),
        )
    )
)

Notes

New settings pattern: Use settings=OpenAIRealtimeLLMSettings(...) for configuration. The legacy session_properties and model constructor parameters are deprecated but still supported.
Bidirectional sync: model and system_instruction at the top level are synced with session_properties.model and session_properties.instructions. Top-level values take precedence when both are set.
Automatic routing: When updating settings with LLMUpdateSettingsFrame, SessionProperties fields (like instructions, output_modalities, etc.) are automatically routed to session_properties.
Model is connection-level: The model parameter is set via the WebSocket URL at connection time and cannot be changed during a session.
Output modalities are single-mode: The API supports either ["text"] or ["audio"] output, not both simultaneously.
Turn detection options: Use TurnDetection for traditional VAD, SemanticTurnDetection for AI-based turn detection, or False to disable server-side detection and manage turns manually.
Audio output format: The service outputs 24kHz PCM audio by default.
Video support: Video frames can be sent to the model for multimodal input. Control the detail level with video_frame_detail and pause/resume with set_video_input_paused().
Transcription frames: User speech transcription frames are always emitted upstream when input audio transcription is configured.

Event Handlers

Event	Description
`on_conversation_item_created`	Called when a new conversation item is created in the session
`on_conversation_item_updated`	Called when a conversation item is updated or completed

@llm.event_handler("on_conversation_item_created")
async def on_item_created(service, item_id, item):
    print(f"New conversation item: {item_id}")

@llm.event_handler("on_conversation_item_updated")
async def on_item_updated(service, item_id, item):
    print(f"Conversation item updated: {item_id}")

API Reference

Services

Utilities

Frameworks

Pipeline

OpenAI Realtime

Overview

OpenAI Realtime API Reference

Example Implementation

OpenAI Documentation

OpenAI Platform

Installation

Prerequisites

OpenAI Account Setup

Required Environment Variables

Key Features

Configuration

OpenAIRealtimeLLMService

OpenAIRealtimeLLMSettings

SessionProperties

AudioConfiguration

TurnDetection

Usage

Basic Setup

With Session Configuration

With Disabled Turn Detection (Manual Control)

Updating Settings at Runtime

Notes

Event Handlers

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

OpenAI Realtime API Reference

Example Implementation

OpenAI Documentation

OpenAI Platform

​Installation

​Prerequisites

​OpenAI Account Setup

​Required Environment Variables

​Key Features

​Configuration

​OpenAIRealtimeLLMService

​OpenAIRealtimeLLMSettings

​SessionProperties

​AudioConfiguration

​TurnDetection

​Usage

​Basic Setup

​With Session Configuration

​With Disabled Turn Detection (Manual Control)

​Updating Settings at Runtime

​Notes

​Event Handlers

Overview

Installation

Prerequisites

OpenAI Account Setup

Required Environment Variables

Key Features

Configuration

OpenAIRealtimeLLMService

OpenAIRealtimeLLMSettings

SessionProperties

AudioConfiguration

TurnDetection

Usage

Basic Setup

With Session Configuration

With Disabled Turn Detection (Manual Control)

Updating Settings at Runtime

Notes

Event Handlers