Llama API¶
Llama API is a Meta-hosted API service that helps you integrate Llama models into your applications quickly and efficiently.
Llama API provides access to Llama models through a simple API interface, with inference provided by Meta, so you can focus on building AI-powered solutions without managing your own inference infrastructure.
With Llama API, you get access to state-of-the-art AI capabilities through a developer-friendly interface designed for simplicity and performance.
Installation¶
Llama API is configured as an optional dependency in Strands Agents. To install, run:
pip install 'strands-agents[llamaapi]'
Usage¶
After installing llamaapi
, you can import and initialize Strands Agents' Llama API provider as follows:
from strands import Agent
from strands.models.llamaapi import LlamaAPIModel
from strands_tools import calculator
model = LlamaAPIModel(
client_args={
"api_key": "<KEY>",
},
# **model_config
model_id="Llama-4-Maverick-17B-128E-Instruct-FP8",
)
agent = Agent(model=model, tools=[calculator])
response = agent("What is 2+2")
print(response)
Configuration¶
Client Configuration¶
The client_args
configure the underlying LlamaAPI client. For a complete list of available arguments, please refer to the LlamaAPI docs.
Model Configuration¶
The model_config
configures the underlying model selected for inference. The supported configurations are:
Parameter | Description | Example | Options |
---|---|---|---|
model_id |
ID of a model to use | Llama-4-Maverick-17B-128E-Instruct-FP8 |
reference |
repetition_penalty |
Controls the likelihood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) | 1 |
reference |
temperature |
Controls randomness of the response by setting a temperature. | 0.7 |
reference |
top_p |
Controls diversity of the response by setting a probability threshold when choosing the next token. | 0.9 |
reference |
max_completion_tokens |
The maximum number of tokens to generate. | 4096 |
reference |
top_k |
Only sample from the top K options for each subsequent token. | 10 |
reference |
Troubleshooting¶
Module Not Found¶
If you encounter the error ModuleNotFoundError: No module named 'llamaapi'
, this means you haven't installed the llamaapi
dependency in your environment. To fix, run pip install 'strands-agents[llamaapi]'
.
Advanced Features¶
Structured Output¶
Llama API models support structured output through their tool calling capabilities. When you use Agent.structured_output()
, the Strands SDK converts your Pydantic models to tool specifications that Llama models can understand.
from pydantic import BaseModel, Field
from strands import Agent
from strands.models.llamaapi import LlamaAPIModel
class BookAnalysis(BaseModel):
"""Analyze a book's key information."""
title: str = Field(description="The book's title")
author: str = Field(description="The book's author")
genre: str = Field(description="Primary genre or category")
summary: str = Field(description="Brief summary of the book")
rating: int = Field(description="Rating from 1-10", ge=1, le=10)
model = LlamaAPIModel(
client_args={"api_key": "<KEY>"},
model_id="Llama-4-Maverick-17B-128E-Instruct-FP8",
)
agent = Agent(model=model)
result = agent.structured_output(
BookAnalysis,
"""
Analyze this book: "The Hitchhiker's Guide to the Galaxy" by Douglas Adams.
It's a science fiction comedy about Arthur Dent's adventures through space
after Earth is destroyed. It's widely considered a classic of humorous sci-fi.
"""
)
print(f"Title: {result.title}")
print(f"Author: {result.author}")
print(f"Genre: {result.genre}")
print(f"Rating: {result.rating}")