Skip to main content

Overview

OpenAIRealtimeLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with minimal latency response times.

Installation

To use OpenAI Realtime services, install the required dependencies:
pip install "pipecat-ai[openai]"

Prerequisites

OpenAI Account Setup

Before using OpenAI Realtime services, you need:
  1. OpenAI Account: Sign up at OpenAI Platform
  2. API Key: Generate an OpenAI API key from your account dashboard
  3. Model Access: Ensure access to GPT-4o Realtime models
  4. Usage Limits: Configure appropriate usage limits and billing

Required Environment Variables

  • OPENAI_API_KEY: Your OpenAI API key for authentication

Key Features

  • Real-time Speech-to-Speech: Direct audio processing with minimal latency
  • Advanced Turn Detection: Multiple voice activity detection options including semantic detection
  • Function Calling: Seamless support for external functions and APIs
  • Voice Options: Multiple voice personalities and speaking styles
  • Conversation Management: Intelligent context handling and conversation flow control

Configuration

OpenAIRealtimeLLMService

api_key
str
required
OpenAI API key for authentication.
model
str
default:"gpt-realtime-1.5"
OpenAI Realtime model name. This is a connection-level parameter set via the WebSocket URL and cannot be changed during the session.Deprecated: Pass via settings=OpenAIRealtimeLLMSettings(model="...") instead.
base_url
str
default:"wss://api.openai.com/v1/realtime"
WebSocket base URL for the Realtime API. Override for custom or proxied deployments.
session_properties
SessionProperties
default:"None"
Configuration properties for the realtime session. These are session-level settings that can be updated during the session (except for voice and model). See SessionProperties below.Deprecated: Use settings=OpenAIRealtimeLLMSettings(session_properties=...) instead.
settings
OpenAIRealtimeLLMSettings
default:"None"
Runtime-updatable settings for this service. Preferred method for configuring the service. See OpenAIRealtimeLLMSettings below.
start_audio_paused
bool
default:"False"
Whether to start with audio input paused. Useful when you want to control when audio processing begins.
start_video_paused
bool
default:"False"
Whether to start with video input paused.
video_frame_detail
str
default:"auto"
Detail level for video processing. Can be "auto", "low", or "high". "auto" lets the model decide, "low" is faster and uses fewer tokens, "high" provides more detail.

OpenAIRealtimeLLMSettings

Runtime-updatable settings for OpenAI Realtime. All fields from SessionProperties can be passed here, and will be automatically routed to the session configuration.
model
str
default:"None"
Model to use. Syncs bidirectionally with session_properties.model.
system_instruction
str
default:"None"
System instructions for the assistant. Syncs bidirectionally with session_properties.instructions.
session_properties
SessionProperties
default:"None"
OpenAI Realtime session properties (modalities, audio config, tools, etc.). model and instructions fields are synced with the top-level model and system_instruction fields. Top-level values take precedence when both are set.

SessionProperties

Session-level configuration passed via the session_properties constructor argument. These settings can be updated during the session using LLMUpdateSettingsFrame.
ParameterTypeDefaultDescription
output_modalitiesList[Literal["text", "audio"]]NoneModalities the model can respond with. The API supports single modality responses: either ["text"] or ["audio"].
instructionsstrNoneSystem instructions for the assistant.
audioAudioConfigurationNoneConfiguration for input and output audio (format, transcription, turn detection, voice, speed).
toolsList[Dict]NoneAvailable function tools for the assistant.
tool_choiceLiteral["auto", "none", "required"]NoneTool usage strategy.
max_output_tokensint | Literal["inf"]NoneMaximum tokens in response, or "inf" for unlimited.
tracingLiteral["auto"] | DictNoneConfiguration options for tracing.

AudioConfiguration

The audio field in SessionProperties accepts an AudioConfiguration with input and output sub-configurations: AudioInput (audio.input):
ParameterTypeDefaultDescription
formatAudioFormatNoneInput audio format (PCMAudioFormat, PCMUAudioFormat, or PCMAAudioFormat).
transcriptionInputAudioTranscriptionNoneTranscription settings: model (e.g. "gpt-4o-transcribe"), language, and prompt.
noise_reductionInputAudioNoiseReductionNoneNoise reduction type: "near_field" or "far_field".
turn_detectionTurnDetection | SemanticTurnDetection | boolNoneTurn detection config, or False to disable server-side turn detection.
AudioOutput (audio.output):
ParameterTypeDefaultDescription
formatAudioFormatNoneOutput audio format.
voicestrNoneVoice the model uses to respond (e.g. "alloy", "echo", "shimmer").
speedfloatNoneSpeed of the model’s spoken response.

TurnDetection

Server-side VAD configuration via TurnDetection:
ParameterTypeDefaultDescription
typeLiteral["server_vad"]"server_vad"Detection type.
thresholdfloat0.5Voice activity detection threshold (0.0-1.0).
prefix_padding_msint300Padding before speech starts in milliseconds.
silence_duration_msint500Silence duration to detect speech end in milliseconds.
Alternatively, use SemanticTurnDetection for semantic-based detection:
ParameterTypeDefaultDescription
typeLiteral["semantic_vad"]"semantic_vad"Detection type.
eagernessLiteral["low", "medium", "high", "auto"]NoneTurn detection eagerness level.
create_responseboolNoneWhether to automatically create responses on turn detection.
interrupt_responseboolNoneWhether to interrupt ongoing responses on turn detection.

Usage

Basic Setup

import os
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService

llm = OpenAIRealtimeLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
)

With Session Configuration

from pipecat.services.openai.realtime.llm import (
    OpenAIRealtimeLLMService,
    OpenAIRealtimeLLMSettings,
)
from pipecat.services.openai.realtime.events import (
    SessionProperties,
    AudioConfiguration,
    AudioInput,
    AudioOutput,
    InputAudioTranscription,
    SemanticTurnDetection,
)

llm = OpenAIRealtimeLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIRealtimeLLMSettings(
        system_instruction="You are a helpful assistant.",
        session_properties=SessionProperties(
            audio=AudioConfiguration(
                input=AudioInput(
                    transcription=InputAudioTranscription(model="gpt-4o-transcribe"),
                    turn_detection=SemanticTurnDetection(eagerness="medium"),
                ),
                output=AudioOutput(
                    voice="alloy",
                    speed=1.0,
                ),
            ),
            max_output_tokens=4096,
        ),
    ),
)

With Disabled Turn Detection (Manual Control)

from pipecat.services.openai.realtime.llm import (
    OpenAIRealtimeLLMService,
    OpenAIRealtimeLLMSettings,
)
from pipecat.services.openai.realtime.events import (
    SessionProperties,
    AudioConfiguration,
    AudioInput,
)

llm = OpenAIRealtimeLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIRealtimeLLMSettings(
        session_properties=SessionProperties(
            audio=AudioConfiguration(
                input=AudioInput(
                    turn_detection=False,
                ),
            ),
        ),
    ),
)

Updating Settings at Runtime

Settings can be updated during a session using LLMUpdateSettingsFrame. You can pass SessionProperties fields directly in the settings dict, and they will be automatically routed to session_properties:
from pipecat.frames.frames import LLMUpdateSettingsFrame

# Update session properties - keys are automatically routed to session_properties
await task.queue_frame(
    LLMUpdateSettingsFrame(
        settings={
            "instructions": "Now speak in Spanish.",
            "max_output_tokens": 2048,
        }
    )
)
Alternatively, you can use LLMUpdateSettingsFrame with a delta parameter for type-safe updates:
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMSettings
from pipecat.services.openai.realtime.events import SessionProperties

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=OpenAIRealtimeLLMSettings(
            system_instruction="Now speak in Spanish.",
            session_properties=SessionProperties(
                max_output_tokens=2048,
            ),
        )
    )
)

Notes

  • New settings pattern: Use settings=OpenAIRealtimeLLMSettings(...) for configuration. The legacy session_properties and model constructor parameters are deprecated but still supported.
  • Bidirectional sync: model and system_instruction at the top level are synced with session_properties.model and session_properties.instructions. Top-level values take precedence when both are set.
  • Automatic routing: When updating settings with LLMUpdateSettingsFrame, SessionProperties fields (like instructions, output_modalities, etc.) are automatically routed to session_properties.
  • Model is connection-level: The model parameter is set via the WebSocket URL at connection time and cannot be changed during a session.
  • Output modalities are single-mode: The API supports either ["text"] or ["audio"] output, not both simultaneously.
  • Turn detection options: Use TurnDetection for traditional VAD, SemanticTurnDetection for AI-based turn detection, or False to disable server-side detection and manage turns manually.
  • Audio output format: The service outputs 24kHz PCM audio by default.
  • Video support: Video frames can be sent to the model for multimodal input. Control the detail level with video_frame_detail and pause/resume with set_video_input_paused().
  • Transcription frames: User speech transcription frames are always emitted upstream when input audio transcription is configured.

Event Handlers

EventDescription
on_conversation_item_createdCalled when a new conversation item is created in the session
on_conversation_item_updatedCalled when a conversation item is updated or completed
@llm.event_handler("on_conversation_item_created")
async def on_item_created(service, item_id, item):
    print(f"New conversation item: {item_id}")

@llm.event_handler("on_conversation_item_updated")
async def on_item_updated(service, item_id, item):
    print(f"Conversation item updated: {item_id}")