Skip to content

Gemini Live [Experimental]

Experimental Feature

This feature is experimental and may change in future versions. Use with caution in production environments.

The Gemini Live API lets developers create natural conversations by enabling a two-way WebSocket connection with the Gemini models. The Live API processes data streams in real time. Users can interrupt the AI's responses with new input, similar to a real conversation. Key features include:

  • Multimodal Streaming: The API supports streaming of text, audio, and video data.
  • Bidirectional Interaction: The user and the model can provide input and output at the same time.
  • Interruptibility: Users can interrupt the model's response, and the model adjusts its response.
  • Tool Use and Function Calling: The API can use external tools to perform actions and get context while maintaining a real-time connection.
  • Session Management: Supports managing long conversations through sessions, providing context and continuity.
  • Secure Authentication: Uses tokens for secure client-side authentication.

Usage

import asyncio

from strands.experimental.bidi import BidiAgent
from strands.experimental.bidi.io import BidiAudioIO, BidiTextIO
from strands.experimental.bidi.models import BidiGeminiLiveModel
from strands.experimental.bidi.tools import stop_conversation

from strands_tools import calculator


async def main() -> None:
    model = BidiGeminiLiveModel(
        model_id="gemini-2.5-flash-native-audio-preview-09-2025",
        provider_config={
            "audio": {
                "voice": "Kore",
            },
        },
        client_config={"api_key": "<GOOGLE_AI_API_KEY>"},
    )
    # stop_conversation tool allows user to verbally stop agent execution.
    agent = BidiAgent(model=model, tools=[calculator, stop_conversation])

    audio_io = BidiAudioIO()
    text_io = BidiTextIO()
    await agent.run(inputs=[audio_io.input()], outputs=[audio_io.output(), text_io.output()])


if __name__ == "__main__":
    asyncio.run(main())

Configuration

Client Configs

For details on the supported client configs, see here.

Provider Configs

Parameter Description Example Options
audio AudioConfig instance. {"voice": "Kore"} reference
inference Dict of inference fields specified in the Gemini LiveConnectConfig. {"temperature": 0.7} reference

For the list of supported voices and languages, see here.

Session Management

Currently, BidiGeminiLiveModel does not produce a message history and so has limited compatability with the Strands session manager. However, the provider does utilize Gemini's Session Resumption as part of the connection restart workflow. This allows Gemini Live connections to persist up to 24 hours. After this time limit, a new BidiGeminiLiveModel instance must be created to continue conversations.

References