May 15, 2026 5 min read

What We Learned from One Year of Building Production Agents

Name: Strands Agents SDK
Author: Strands Agents

Strands Agents turned one year old. Here are the key lessons our engineers learned after open sourcing this framework and hitting 25 million downloads.

Albert Zhao

We turned 1 year old!

Strands Agents launched over a year ago thanks to an internal effort by AWS engineers building a network troubleshooting agent. They didn’t use a heavy-duty framework or write enormous amounts of workflow boilerplate. They essentially wired up a system prompt, a Claude 3 model, and tools, resolving 80% of network root causes. Today that’s branded as an “agent harness”.

Our philosophy for keeping architecture minimal helped teams across AWS and beyond ship production agents handling customer traffic at scale. There are a lot of lessons our engineers learned after open sourcing this framework and hitting 25 million downloads. Here are the key ones:

Workflow boilerplate in agents can become easily outdated

When Strands launched in May 2025, the most advanced models had a 200k context window. The agents we saw developers build were usually chatbots connected to a knowledge base or a workflow that classified data in batches. Agent frameworks gave tons of scaffolding to optimize these tasks. But what if a model got better? You potentially wound up with a lot of technical debt refactoring the agent.

For example, take this Sonnet 3.7 agent that reviews CloudWatch alarms across AWS accounts. We saw a lot of customers build something like this with another agent framework:

from agentframework import Agent, tool, Graph, GraphNode, GraphEdge, SlidingWindowConversationManager

@tool
def list_cloudwatch_alarms(state_filter: str = "ALARM") -> dict:
    """List CloudWatch alarms filtered by state."""
    ...

@tool
def get_alarm_logs(alarm_name: str) -> dict:
    """Get CloudWatch Logs related to an alarm from the past 7 days."""
    ...

@tool
def format_alarm_report(raw_data: str) -> dict:
    """Format alarm data into a human-readable report."""
    ...

fetch_agent = Agent(
    model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    system_prompt="List all alarms currently in ALARM state.",
    tools=[list_cloudwatch_alarms],
    conversation_manager=SlidingWindowConversationManager(window_size=10, per_turn=True),
)
logs_agent = Agent(
    model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    system_prompt="For each alarm provided, get its logs from the past 7 days to determine what's been happening.",
    tools=[get_alarm_logs],
    conversation_manager=SlidingWindowConversationManager(window_size=10, per_turn=True),
)
formatter_agent = Agent(
    model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    system_prompt="Format the alarm data and logs into a clean report grouped by severity. Include relevant log patterns for each alarm.",
    tools=[format_alarm_report],
    conversation_manager=SlidingWindowConversationManager(window_size=20, per_turn=True),
)
graph = Graph(
    nodes={
        "fetch": GraphNode(agent=fetch_agent),
        "logs": GraphNode(agent=logs_agent),
        "format": GraphNode(agent=formatter_agent),
    },
    edges=[
        GraphEdge(source="fetch", target="logs"),
        GraphEdge(source="logs", target="format"),
    ],
)
result = graph.invoke({"input": "What alarms are firing right now?"})

A lot of developers thought best practices meant using a graph workflow that scaffolded each step into a separate agent. For simpler use cases like this, that seemed over-engineered to us. This type of scaffolding also became outdated when Sonnet 4 launched just a few months after 3.7, allowing the model’s extended thinking to better alternate between reasoning and tools.

Here’s what that same agent on Strands would’ve looked like — no refactoring needed when a better model releases:

from strands import Agent, tool

@tool
def list_cloudwatch_alarms(state_filter: str = "ALARM") -> dict:
    """List CloudWatch alarms filtered by state."""
    ...

@tool
def get_alarm_logs(alarm_name: str) -> dict:
    """Get CloudWatch Logs related to an alarm from the past 7 days."""
    ...

agent = Agent(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    system_prompt=(
        "You check CloudWatch alarms and report what's firing. "
        "List active alarms, pull logs from the past 7 days for each, "
        "and produce a report grouped by severity."
    ),
    tools=[list_cloudwatch_alarms, get_alarm_logs],
)
result = agent("What alarms are firing right now?")
print(result)

Simply change the model_id, run evals, and check logs for improvements.

Steering hooks provide reliability in the agent harness

When an agent has dozens of steps of business logic, most agent frameworks suggest predefining each step to reduce the chance of the model making mistakes. Graph-based workflows became popular for this reason, but our testing found this led to 80.8% accuracy for agent output. Clare Liguori, Sr. Principal Engineer at AWS, ran hundreds of evals and found Steering Hooks could provide 100% accuracy pass rate in comparison. You simply define hooks that verify agent output in the loop at two points: before a tool executes and after the model responds.

Clare showed this in action with an agent harness that checks your local library’s book renewal status. Before the agent renews a book, Strands provides the controls to first check the book status, whether the book is recalled, and if it’s using the right library card number. If not, Steering Hooks stop the action and give the model feedback so it can fix itself. After the model responds, Steering Hooks can check whether the answer was right and ask the model to revise if needed.

Are people really building long-running agents and subagents?

The vast majority of agent use cases we still see automate away rote tasks. Execution times rarely go beyond 30 minutes, such as building agents on GitHub for checking code and doing CI/CD work.

However, AI labs position recent model releases to improve with long-running agents and swarms of subagents. We do see signs that they will become more common, particularly when an agent executes a task, waits for human input, then proceeds to delegate the work to a subagent.

The Strands team is dedicated to providing the lightest weight agent harness for developers, so you never prematurely freeze your agent with today’s model limits. We’re excited by what the world of agents will look like by our next birthday. Stay in touch with us in the meantime over Discord!