LiteLLM¶

Language Support

This provider is only supported in Python.

LiteLLM is a unified interface for various LLM providers that allows you to interact with models from Amazon, Anthropic, OpenAI, and many others through a single API. The Strands Agents SDK implements a LiteLLM provider, allowing you to run agents against any model LiteLLM supports.

Installation¶

LiteLLM is configured as an optional dependency in Strands Agents. To install, run:

pip install 'strands-agents[litellm]' strands-agents-tools

Usage¶

After installing litellm, you can import and initialize Strands Agents' LiteLLM provider as follows:

from strands import Agent
from strands.models.litellm import LiteLLMModel
from strands_tools import calculator

model = LiteLLMModel(
    client_args={
        "api_key": "<KEY>",
    },
    # **model_config
    model_id="anthropic/claude-3-7-sonnet-20250219",
    params={
        "max_tokens": 1000,
        "temperature": 0.7,
    }
)

agent = Agent(model=model, tools=[calculator])
response = agent("What is 2+2")
print(response)

Using LiteLLM Proxy¶

To use a LiteLLM Proxy Server, you have two options:

Option 1: Use `use_litellm_proxy` parameter¶

from strands import Agent
from strands.models.litellm import LiteLLMModel

model = LiteLLMModel(
    client_args={
        "api_key": "<PROXY_KEY>",
        "api_base": "<PROXY_URL>",
        "use_litellm_proxy": True
    },
    model_id="amazon.nova-lite-v1:0"
)

agent = Agent(model=model)
response = agent("Tell me a story")

Option 2: Use `litellm_proxy/` prefix in model ID¶

model = LiteLLMModel(
    client_args={
        "api_key": "<PROXY_KEY>",
        "api_base": "<PROXY_URL>"
    },
    model_id="litellm_proxy/amazon.nova-lite-v1:0"
)

Configuration¶

Client Configuration¶

The client_args configure the underlying LiteLLM completion API. For a complete list of available arguments, please refer to the LiteLLM docs.

Model Configuration¶

The model_config configures the underlying model selected for inference. The supported configurations are:

Parameter	Description	Example	Options
`model_id`	ID of a model to use	`anthropic/claude-3-7-sonnet-20250219`	reference
`params`	Model specific parameters	`{"max_tokens": 1000, "temperature": 0.7}`	reference

Troubleshooting¶

Module Not Found¶

If you encounter the error ModuleNotFoundError: No module named 'litellm', this means you haven't installed the litellm dependency in your environment. To fix, run pip install 'strands-agents[litellm]'.

Advanced Features¶

Caching¶

LiteLLM supports provider-agnostic caching through SystemContentBlock arrays, allowing you to define cache points that work across all supported model providers. This enables you to reuse parts of previous requests, which can significantly reduce token usage and latency.

System Prompt Caching¶

Use SystemContentBlock arrays to define cache points in your system prompts:

from strands import Agent
from strands.models.litellm import LiteLLMModel
from strands.types.content import SystemContentBlock

# Define system content with cache points
system_content = [
    SystemContentBlock(
        text="You are a helpful assistant that provides concise answers. "
             "This is a long system prompt with detailed instructions..."
             "..." * 1000  # needs to be at least 1,024 tokens
    ),
    SystemContentBlock(cachePoint={"type": "default"})
]

# Create an agent with SystemContentBlock array
model = LiteLLMModel(
    model_id="bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0"
)

agent = Agent(model=model, system_prompt=system_content)

# First request will cache the system prompt
response1 = agent("Tell me about Python")
# Cache metrics like cacheWriteInputTokens will be present in response1.metrics.accumulated_usage

# Second request will reuse the cached system prompt
response2 = agent("Tell me about JavaScript")
# Cache metrics like cacheReadInputTokens will be present in response2.metrics.accumulated_usage

Note: Caching availability and behavior depends on the underlying model provider accessed through LiteLLM. Some providers may have minimum token requirements or other limitations for cache creation.

Structured Output¶

LiteLLM supports structured output by proxying requests to underlying model providers that support tool calling. The availability of structured output depends on the specific model and provider you're using through LiteLLM.

from pydantic import BaseModel, Field
from strands import Agent
from strands.models.litellm import LiteLLMModel

class BookAnalysis(BaseModel):
    """Analyze a book's key information."""
    title: str = Field(description="The book's title")
    author: str = Field(description="The book's author")
    genre: str = Field(description="Primary genre or category")
    summary: str = Field(description="Brief summary of the book")
    rating: int = Field(description="Rating from 1-10", ge=1, le=10)

model = LiteLLMModel(
    model_id="bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0"
)

agent = Agent(model=model)

result = agent.structured_output(
    BookAnalysis,
    """
    Analyze this book: "The Hitchhiker's Guide to the Galaxy" by Douglas Adams.
    It's a science fiction comedy about Arthur Dent's adventures through space
    after Earth is destroyed. It's widely considered a classic of humorous sci-fi.
    """
)

print(f"Title: {result.title}")
print(f"Author: {result.author}")
print(f"Genre: {result.genre}")
print(f"Rating: {result.rating}")

References¶

API
LiteLLM