Deploying Strands Agents SDK Agents to AWS Fargate¶
AWS Fargate is a serverless compute engine for containers that works with Amazon ECS and EKS. It allows you to run containers without having to manage servers or clusters. This makes it an excellent choice for deploying Strands Agents SDK agents as containerized applications with high availability and scalability.
If you're not familiar with the AWS CDK, check out the official documentation.
This guide discusses Fargate integration at a high level - for a complete example project deploying to Fargate, check out the deploy_to_fargate
sample project on GitHub.
Creating Your Agent in Python¶
The core of your Fargate deployment is a containerized Flask application that hosts your Strands Agents SDK agent. This Python application initializes your agent and processes incoming HTTP requests.
The FastAPI application follows these steps:
- Define endpoints for agent interactions
- Create a Strands Agents SDK agent with the specified system prompt and tools
- Process incoming requests through the agent
- Return the response back to the client
Here's an example of a weather forecasting agent application (app.py
):
app = FastAPI(title="Weather API")
# Define a weather-focused system prompt
WEATHER_SYSTEM_PROMPT = """You are a weather assistant with HTTP capabilities. You can:
1. Make HTTP requests to the National Weather Service API
2. Process and display weather forecast data
3. Provide weather information for locations in the United States
When retrieving weather information:
1. First get the coordinates or grid information using https://api.weather.gov/points/{latitude},{longitude} or https://api.weather.gov/points/{zipcode}
2. Then use the returned forecast URL to get the actual forecast
When displaying responses:
- Format weather data in a human-readable way
- Highlight important information like temperature, precipitation, and alerts
- Handle errors appropriately
- Don't ask follow-up questions
Always explain the weather conditions clearly and provide context for the forecast.
At the point where tools are done being invoked and a summary can be presented to the user, invoke the ready_to_summarize
tool and then continue with the summary.
"""
class PromptRequest(BaseModel):
prompt: str
@app.post('/weather')
async def get_weather(request: PromptRequest):
"""Endpoint to get weather information."""
prompt = request.prompt
if not prompt:
raise HTTPException(status_code=400, detail="No prompt provided")
try:
weather_agent = Agent(
system_prompt=WEATHER_SYSTEM_PROMPT,
tools=[http_request],
)
response = weather_agent(prompt)
content = str(response)
return PlainTextResponse(content=content)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Streaming responses¶
Streaming responses can significantly improve the user experience by providing real-time responses back to the customer. This is especially valuable for longer responses.
Python web-servers commonly implement streaming through the use of iterators and the Strands Agents SDK facilitates response streaming via the stream_async(prompt)
api:
async def run_weather_agent_and_stream_response(prompt: str):
is_summarizing = False
@tool
def ready_to_summarize():
nonlocal is_summarizing
is_summarizing = True
return "Ok - continue providing the summary!"
weather_agent = Agent(
system_prompt=WEATHER_SYSTEM_PROMPT,
tools=[http_request, ready_to_summarize],
callback_handler=None
)
async for item in weather_agent.stream_async(prompt):
if not is_summarizing:
continue
if "data" in item:
yield item['data']
@app.route('/weather-streaming', methods=['POST'])
async def get_weather_streaming(request: PromptRequest):
try:
prompt = request.prompt
if not prompt:
raise HTTPException(status_code=400, detail="No prompt provided")
return StreamingResponse(
run_weather_agent_and_stream_response(prompt),
media_type="text/plain"
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
The implementation above employs a custom tool to mark the boundary between information gathering and summary generation phases. This approach ensures that only the final, user-facing content is streamed to the client, maintaining consistency with the non-streaming endpoint while providing the benefits of incremental response delivery.
Containerization¶
To deploy your agent to Fargate, you need to containerize it using Podman or Docker. The Dockerfile defines how your application is packaged and run. Below is an example docker file that installs all needed dependencies, the application, and configures the FastAPI server to run via unicorn (Dockerfile):
FROM public.ecr.aws/docker/library/python:3.12-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY app/ .
# Create a non-root user to run the application
RUN useradd -m appuser
USER appuser
# Expose the port the app runs on
EXPOSE 8000
# Command to run the application with Uvicorn
# - port: 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
Infrastructure¶
To deploy the containerized agent to Fargate using the TypeScript CDK, you need to define the infrastructure stack (agent-fargate-stack.ts). Much of the configuration follows standard Fargate deployment patterns, but the following code snippet highlights the key components specific to deploying Strands Agents SDK agents:
// ... vpc, cluster, logGroup, executionRole, and taskRole omitted for brevity ...
// Add permissions for the task to invoke Bedrock APIs
taskRole.addToPolicy(
new iam.PolicyStatement({
actions: ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"],
resources: ["*"],
}),
);
// Create a task definition
const taskDefinition = new ecs.FargateTaskDefinition(this, "AgentTaskDefinition", {
memoryLimitMiB: 512,
cpu: 256,
executionRole,
taskRole,
runtimePlatform: {
cpuArchitecture: ecs.CpuArchitecture.ARM64,
operatingSystemFamily: ecs.OperatingSystemFamily.LINUX,
},
});
// This will use the Dockerfile in the docker directory
const dockerAsset = new ecrAssets.DockerImageAsset(this, "AgentImage", {
directory: path.join(__dirname, "../docker"),
file: "./Dockerfile",
platform: ecrAssets.Platform.LINUX_ARM64,
});
// Add container to the task definition
taskDefinition.addContainer("AgentContainer", {
image: ecs.ContainerImage.fromDockerImageAsset(dockerAsset),
logging: ecs.LogDrivers.awsLogs({
streamPrefix: "agent-service",
logGroup,
}),
environment: {
// Add any environment variables needed by your application
LOG_LEVEL: "INFO",
},
portMappings: [
{
containerPort: 8000, // The port your application listens on
protocol: ecs.Protocol.TCP,
},
],
});
// Create a Fargate service
const service = new ecs.FargateService(this, "AgentService", {
cluster,
taskDefinition,
desiredCount: 2, // Run 2 instances for high availability
assignPublicIp: false, // Use private subnets with NAT gateway
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
circuitBreaker: {
rollback: true,
},
securityGroups: [
new ec2.SecurityGroup(this, "AgentServiceSG", {
vpc,
description: "Security group for Agent Fargate Service",
allowAllOutbound: true,
}),
],
minHealthyPercent: 100,
maxHealthyPercent: 200,
healthCheckGracePeriod: Duration.seconds(60),
});
// ... load balancer omitted for brevity ...
The full example (agent-fargate-stack.ts):
- Creates a VPC with public and private subnets
- Sets up an ECS cluster
- Defines a task role with permissions to invoke Bedrock APIs
- Creates a Fargate task definition
- Builds a Docker image from your Dockerfile
- Configures a Fargate service with multiple instances for high availability
- Sets up an Application Load Balancer with health checks
- Outputs the load balancer DNS name for accessing your service
Deploying Your Agent & Testing¶
Assuming that Python & Node dependencies are already installed, run the CDK and deploy which will also run the docker file for deployment:
# Bootstrap your AWS environment (if not already done)
npx cdk bootstrap
# Ensure Docker or Podman is running
podman machine start
# Deploy the stack
CDK_DOCKER=podman npx cdk deploy
Once deployed, you can test your agent using the Application Load Balancer URL:
# Get the service URL from the CDK output
SERVICE_URL=$(aws cloudformation describe-stacks --stack-name AgentFargateStack --query "Stacks[0].Outputs[?ExportName=='AgentServiceEndpoint'].OutputValue" --output text)
# Call the weather service
curl -X POST \
http://$SERVICE_URL/weather \
-H 'Content-Type: application/json' \
-d '{"prompt": "What is the weather in Seattle?"}'
# Call the streaming endpoint
curl -X POST \
http://$SERVICE_URL/weather-streaming \
-H 'Content-Type: application/json' \
-d '{"prompt": "What is the weather in New York in Celsius?"}'
Summary¶
The above steps covered:
- Creating a FastAPI application that hosts your Strands Agents SDK agent
- Containerizing your application with Podman
- Creating the CDK infrastructure to deploy to Fargate
- Deploying the agent and infrastructure to an AWS account
- Manually testing the deployed service
Possible follow-up tasks would be to:
- Set up auto-scaling based on CPU/memory usage or request count
- Implement API authentication for secure access
- Add custom domain name and HTTPS support
- Set up monitoring and alerting
- Implement CI/CD pipeline for automated deployments
Complete Example¶
For the complete example code, including all files and configurations, see the deploy_to_fargate
sample project on GitHub.