Deploying Strands Agents SDK Agents to Amazon EC2¶
Amazon EC2 (Elastic Compute Cloud) provides resizable compute capacity in the cloud, making it a flexible option for deploying Strands Agents SDK agents. This deployment approach gives you full control over the underlying infrastructure while maintaining the ability to scale as needed.
If you're not familiar with the AWS CDK, check out the official documentation.
This guide discusses EC2 integration at a high level - for a complete example project deploying to EC2, check out the deploy_to_ec2
sample project on GitHub.
Creating Your Agent in Python¶
The core of your EC2 deployment is a FastAPI application that hosts your Strands Agents SDK agent. This Python application initializes your agent and processes incoming HTTP requests.
The FastAPI application follows these steps:
- Define endpoints for agent interactions
- Create a Strands Agents SDK agent with the specified system prompt and tools
- Process incoming requests through the agent
- Return the response back to the client
Here's an example of a weather forecasting agent application (app.py
):
app = FastAPI(title="Weather API")
# Define a weather-focused system prompt
WEATHER_SYSTEM_PROMPT = """You are a weather assistant with HTTP capabilities. You can:
1. Make HTTP requests to the National Weather Service API
2. Process and display weather forecast data
3. Provide weather information for locations in the United States
When retrieving weather information:
1. First get the coordinates or grid information using https://api.weather.gov/points/{latitude},{longitude} or https://api.weather.gov/points/{zipcode}
2. Then use the returned forecast URL to get the actual forecast
When displaying responses:
- Format weather data in a human-readable way
- Highlight important information like temperature, precipitation, and alerts
- Handle errors appropriately
- Don't ask follow-up questions
Always explain the weather conditions clearly and provide context for the forecast.
At the point where tools are done being invoked and a summary can be presented to the user, invoke the ready_to_summarize
tool and then continue with the summary.
"""
@app.route('/weather', methods=['POST'])
def get_weather():
"""Endpoint to get weather information."""
data = request.json
prompt = data.get('prompt')
if not prompt:
return jsonify({"error": "No prompt provided"}), 400
try:
weather_agent = Agent(
system_prompt=WEATHER_SYSTEM_PROMPT,
tools=[http_request],
)
response = weather_agent(prompt)
content = str(response)
return content, {"Content-Type": "plain/text"}
except Exception as e:
return jsonify({"error": str(e)}), 500
Streaming responses¶
Streaming responses can significantly improve the user experience by providing real-time responses back to the customer. This is especially valuable for longer responses.
The EC2 deployment implements streaming through a custom approach that adapts the agent's output to an iterator that can be consumed by FastAPI. Here's how it's implemented:
def run_weather_agent_and_stream_response(prompt: str):
is_summarizing = False
@tool
def ready_to_summarize():
nonlocal is_summarizing
is_summarizing = True
return "Ok - continue providing the summary!"
def thread_run(callback_handler):
weather_agent = Agent(
system_prompt=WEATHER_SYSTEM_PROMPT,
tools=[http_request, ready_to_summarize],
callback_handler=callback_handler
)
weather_agent(prompt)
iterator = adapt_to_iterator(thread_run)
for item in iterator:
if not is_summarizing:
continue
if "data" in item:
yield item['data']
@app.route('/weather-streaming', methods=['POST'])
def get_weather_streaming():
try:
data = request.json
prompt = data.get('prompt')
if not prompt:
return jsonify({"error": "No prompt provided"}), 400
return run_weather_agent_and_stream_response(prompt), {"Content-Type": "plain/text"}
except Exception as e:
return jsonify({"error": str(e)}), 500
The implementation above employs a custom tool to mark the boundary between information gathering and summary generation phases. This approach ensures that only the final, user-facing content is streamed to the client, maintaining consistency with the non-streaming endpoint while providing the benefits of incremental response delivery.
Infrastructure¶
To deploy the agent to EC2 using the TypeScript CDK, you need to define the infrastructure stack (agent-ec2-stack.ts). The following code snippet highlights the key components specific to deploying Strands Agents SDK agents to EC2:
// ... instance role & security-group omitted for brevity ...
// Upload the application code to S3
const appAsset = new Asset(this, "AgentAppAsset", {
path: path.join(__dirname, "../app"),
});
// Upload dependencies to S3
// This could also be replaced by a pip install if all dependencies are public
const dependenciesAsset = new Asset(this, "AgentDependenciesAsset", {
path: path.join(__dirname, "../packaging/_dependencies"),
});
instanceRole.addToPolicy(
new iam.PolicyStatement({
actions: ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"],
resources: ["*"],
}),
);
// Create an EC2 instance in a public subnet with a public IP
const instance = new ec2.Instance(this, "AgentInstance", {
vpc,
vpcSubnets: { subnetType: ec2.SubnetType.PUBLIC }, // Use public subnet
instanceType: ec2.InstanceType.of(ec2.InstanceClass.T4G, ec2.InstanceSize.MEDIUM), // ARM-based instance
machineImage: ec2.MachineImage.latestAmazonLinux2023({
cpuType: ec2.AmazonLinuxCpuType.ARM_64,
}),
securityGroup: instanceSG,
role: instanceRole,
associatePublicIpAddress: true, // Assign a public IP address
});
For EC2 deployment, the application code and dependencies are packaged separately and uploaded to S3 as assets, During instance initialization, both packages are downloaded and extracted to the appropriate locations and then configured to run as linux service:
// Create user data script to set up the application
const userData = ec2.UserData.forLinux();
userData.addCommands(
"#!/bin/bash",
"set -o verbose",
"yum update -y",
"yum install -y python3.12 python3.12-pip git unzip ec2-instance-connect",
// Create app directory
"mkdir -p /opt/agent-app",
// Download application files from S3
`aws s3 cp ${appAsset.s3ObjectUrl} /tmp/app.zip`,
`aws s3 cp ${dependenciesAsset.s3ObjectUrl} /tmp/dependencies.zip`,
// Extract application files
"unzip /tmp/app.zip -d /opt/agent-app",
"unzip /tmp/dependencies.zip -d /opt/agent-app/_dependencies",
// Create a systemd service file
"cat > /etc/systemd/system/agent-app.service << 'EOL'",
"[Unit]",
"Description=Weather Agent Application",
"After=network.target",
"",
"[Service]",
"User=ec2-user",
"WorkingDirectory=/opt/agent-app",
"ExecStart=/usr/bin/python3.12 -m uvicorn app:app --host=0.0.0.0 --port=8000 --workers=2",
"Restart=always",
"Environment=PYTHONPATH=/opt/agent-app:/opt/agent-app/_dependencies",
"Environment=LOG_LEVEL=INFO",
"",
"[Install]",
"WantedBy=multi-user.target",
"EOL",
// Enable and start the service
"systemctl enable agent-app.service",
"systemctl start agent-app.service",
);
The full example (agent-ec2-stack.ts):
- Creates a VPC with public subnets
- Sets up an EC2 instance with the appropriate IAM role
- Defines permissions to invoke Bedrock APIs
- Uploads application code and dependencies to S3
- Creates a user data script to:
- Install Python and other dependencies
- Download and extract the application code and dependencies
- Set up the application as a systemd service
- Outputs the instance ID, public IP, and service endpoint for easy access
Deploying Your Agent & Testing¶
To deploy your agent to EC2:
# Bootstrap your AWS environment (if not already done)
npx cdk bootstrap
# Package Python dependencies for the target architecture
pip install -r requirements.txt --platform manylinux2014_aarch64 --target ./packaging/_dependencies --only-binary=:all:
# Deploy the stack
npx cdk deploy
Once deployed, you can test your agent using the public IP address and port:
# Get the service URL from the CDK output
SERVICE_URL=$(aws cloudformation describe-stacks --stack-name AgentEC2Stack --region us-east-1 --query "Stacks[0].Outputs[?ExportName=='Ec2ServiceEndpoint'].OutputValue" --output text)
# Call the weather service
curl -X POST \
http://$SERVICE_URL/weather \
-H 'Content-Type: application/json' \
-d '{"prompt": "What is the weather in Seattle?"}'
# Call the streaming endpoint
curl -X POST \
http://$SERVICE_URL/weather-streaming \
-H 'Content-Type: application/json' \
-d '{"prompt": "What is the weather in New York in Celsius?"}'
Summary¶
The above steps covered:
- Creating a FastAPI application that hosts your Strands Agents SDK agent
- Packaging your application and dependencies for EC2 deployment
- Creating the CDK infrastructure to deploy to EC2
- Setting up the application as a systemd service
- Deploying the agent and infrastructure to an AWS account
- Manually testing the deployed service
Possible follow-up tasks would be to:
- Implement an update mechanism for the application
- Add a load balancer for improved availability and scaling
- Set up auto-scaling with multiple instances
- Implement API authentication for secure access
- Add custom domain name and HTTPS support
- Set up monitoring and alerting
- Implement CI/CD pipeline for automated deployments
Complete Example¶
For the complete example code, including all files and configurations, see the deploy_to_ec2
sample project on GitHub.