Skip to content

User Simulation

Overview

User simulation enables realistic multi-turn conversation evaluation by simulating end-users interacting with your agents. Using the ActorSimulator class configured for user simulation, you can generate dynamic, goal-oriented conversations that test your agent's ability to handle real user interactions.

The from_case_for_user_simulator() factory method automatically configures the simulator with user-appropriate profiles and behaviors:

from strands_evals import ActorSimulator, Case

case = Case(
    input="I need to book a flight to Paris",
    metadata={"task_description": "Flight booking confirmed"}
)

# Automatically configured for user simulation
user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    max_turns=10
)

Key Features

  • Realistic Actor Simulation: Generates human-like responses based on actor profiles
  • Multi-turn Conversations: Maintains context across multiple conversation turns
  • Automatic Profile Generation: Creates actor profiles from test cases
  • Goal-Oriented Behavior: Tracks and evaluates goal completion
  • Flexible Configuration: Supports custom profiles, prompts, and tools
  • Conversation Control: Automatic stopping based on goal completion or turn limits
  • Integration with Evaluators: Works seamlessly with trace-based evaluators

When to Use

Use user simulation when you need to:

  • Evaluate agents in multi-turn user conversations
  • Test how agents handle realistic user behavior
  • Assess goal completion from the user's perspective
  • Generate diverse user interaction patterns
  • Evaluate agents without predefined conversation scripts
  • Test conversational flow and context maintenance with users

Basic Usage

Simple User Simulation

from strands import Agent
from strands_evals import Case, ActorSimulator

# Create test case
case = Case(
    name="flight-booking",
    input="I need to book a flight to Paris next week",
    metadata={"task_description": "Flight booking confirmed"}
)

# Create user simulator
user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    max_turns=5  # Limits conversation length; simulator may stop earlier if goal is achieved
)

# Create target agent to evaluate
agent = Agent(
    system_prompt="You are a helpful travel assistant.",
    callback_handler=None
)

# Run multi-turn conversation
user_message = case.input
conversation_log = []

while user_sim.has_next():
    # Agent responds
    agent_response = agent(user_message)
    agent_message = str(agent_response)
    conversation_log.append({"role": "agent", "message": agent_message})

    # User simulator generates next message
    user_result = user_sim.act(agent_message)
    user_message = str(user_result.structured_output.message)
    conversation_log.append({"role": "user", "message": user_message})

print(f"Conversation completed in {len(conversation_log) // 2} turns")

Actor Profiles

Actor profiles define the characteristics, context, and goals of the simulated actor.

Automatic Profile Generation

The simulator can automatically generate realistic profiles from test cases:

from strands_evals import Case, ActorSimulator

case = Case(
    input="My order hasn't arrived yet",
    metadata={"task_description": "Order status resolved and customer satisfied"}
)

# Profile is automatically generated from input and task_description
user_sim = ActorSimulator.from_case_for_user_simulator(case=case)

# Access the generated profile
print(user_sim.actor_profile.traits)
print(user_sim.actor_profile.context)
print(user_sim.actor_profile.actor_goal)

Custom Actor Profiles

For more control, create custom profiles:

from strands_evals.simulation import ActorSimulator
from strands_evals.types.simulation import ActorProfile

# Define custom profile
profile = ActorProfile(
    traits={
        "expertise_level": "expert",
        "communication_style": "technical",
        "patience_level": "low",
        "detail_preference": "high"
    },
    context="A software engineer debugging a production memory leak issue.",
    actor_goal="Identify the root cause and get actionable steps to resolve the memory leak."
)

# Create simulator with custom profile
simulator = ActorSimulator(
    actor_profile=profile,
    initial_query="Our service is experiencing high memory usage in production.",
    system_prompt_template="You are simulating: {actor_profile}",
    max_turns=10
)

Integration with Evaluators

With Trace-Based Evaluators

from strands import Agent
from strands_evals import Case, Experiment, ActorSimulator
from strands_evals.evaluators import HelpfulnessEvaluator
from strands_evals.mappers import StrandsInMemorySessionMapper
from strands_evals.telemetry import StrandsEvalsTelemetry

# Setup telemetry
telemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()
memory_exporter = telemetry.in_memory_exporter

def task_function(case: Case) -> dict:
    # Create simulator
    user_sim = ActorSimulator.from_case_for_user_simulator(
        case=case,
        max_turns=5
    )

    # Create target agent
    agent = Agent(
        trace_attributes={
            "gen_ai.conversation.id": case.session_id,
            "session.id": case.session_id
        },
        system_prompt="You are a helpful assistant.",
        callback_handler=None
    )

    # Collect spans across all turns
    all_spans = []
    user_message = case.input

    while user_sim.has_next():
        # Clear before each agent call to avoid capturing simulator traces
        memory_exporter.clear()

        # Agent responds
        agent_response = agent(user_message)
        agent_message = str(agent_response)

        # Collect agent spans
        turn_spans = list(memory_exporter.get_finished_spans())
        all_spans.extend(turn_spans)

        # User simulator responds
        user_result = user_sim.act(agent_message)
        user_message = str(user_result.structured_output.message)

    # Map spans to session
    mapper = StrandsInMemorySessionMapper()
    session = mapper.map_to_session(all_spans, session_id=case.session_id)

    return {"output": agent_message, "trajectory": session}

# Create test cases
test_cases = [
    Case(
        name="booking-1",
        input="I need to book a flight to Paris",
        metadata={"task_description": "Flight booking confirmed"}
    )
]

# Run evaluation
evaluators = [HelpfulnessEvaluator()]
experiment = Experiment(cases=test_cases, evaluators=evaluators)
reports = experiment.run_evaluations(task_function)
reports[0].run_display()

Conversation Control

Automatic Stopping

The simulator automatically stops when:

  1. Goal Completion: Actor includes <stop/> token in message
  2. Turn Limit: Maximum number of turns is reached
user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    max_turns=10  # Stop after 10 turns
)

# Check if conversation should continue
while user_sim.has_next():
    # ... conversation logic ...
    pass

Manual Turn Tracking

turn_count = 0
max_turns = 5

while user_sim.has_next() and turn_count < max_turns:
    agent_response = agent(user_message)
    user_result = user_sim.act(str(agent_response))
    user_message = str(user_result.structured_output.message)
    turn_count += 1

print(f"Conversation ended after {turn_count} turns")

Actor Response Structure

Each actor response includes reasoning and the actual message. The reasoning field provides insight into the simulator's decision-making process, helping you understand why it responded in a particular way and whether it's behaving realistically:

user_result = user_sim.act(agent_message)

# Access structured output
reasoning = user_result.structured_output.reasoning
message = user_result.structured_output.message

print(f"Actor's reasoning: {reasoning}")
print(f"Actor's message: {message}")

# Example output:
# Actor's reasoning: "The agent provided flight options but didn't ask for my preferred time. 
#                     I should specify that I prefer morning flights to move the conversation forward."
# Actor's message: "Thanks! Do you have any morning flights available?"

The reasoning is particularly useful for: - Debugging: Understanding why the simulator isn't reaching the goal - Validation: Ensuring the simulator is behaving realistically - Analysis: Identifying patterns in how users respond to agent behavior

Advanced Usage

Custom System Prompts

custom_prompt = """
You are simulating a user with the following profile:
{actor_profile}

Guidelines:
- Be concise and direct
- Ask clarifying questions when needed
- Express satisfaction when goals are met
- Include <stop/> when your goal is achieved
"""

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    system_prompt_template=custom_prompt,
    max_turns=10
)

Adding Custom Tools

from strands import tool

@tool
def check_order_status(order_id: str) -> str:
    """Check the status of an order."""
    return f"Order {order_id} is in transit"

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    tools=[check_order_status],  # Additional tools for the simulator
    max_turns=10
)

Different Model for Simulation

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    model="anthropic.claude-3-5-sonnet-20241022-v2:0",  # Specific model
    max_turns=10
)

Complete Example: Customer Service Evaluation

from strands import Agent
from strands_evals import Case, Experiment, ActorSimulator
from strands_evals.evaluators import HelpfulnessEvaluator, GoalSuccessRateEvaluator
from strands_evals.mappers import StrandsInMemorySessionMapper
from strands_evals.telemetry import StrandsEvalsTelemetry

# Setup telemetry
telemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()
memory_exporter = telemetry.in_memory_exporter

def customer_service_task(case: Case) -> dict:
    """Simulate customer service interaction."""

    # Create user simulator
    user_sim = ActorSimulator.from_case_for_user_simulator(
        case=case,
        max_turns=8
    )

    # Create customer service agent
    agent = Agent(
        trace_attributes={
            "gen_ai.conversation.id": case.session_id,
            "session.id": case.session_id
        },
        system_prompt="""
        You are a helpful customer service agent.
        - Be empathetic and professional
        - Gather necessary information
        - Provide clear solutions
        - Confirm customer satisfaction
        """,
        callback_handler=None
    )

    # Run conversation
    all_spans = []
    user_message = case.input
    conversation_history = []

    while user_sim.has_next():
        memory_exporter.clear()

        # Agent responds
        agent_response = agent(user_message)
        agent_message = str(agent_response)
        conversation_history.append({
            "role": "agent",
            "message": agent_message
        })

        # Collect spans
        turn_spans = list(memory_exporter.get_finished_spans())
        all_spans.extend(turn_spans)

        # User responds
        user_result = user_sim.act(agent_message)
        user_message = str(user_result.structured_output.message)
        conversation_history.append({
            "role": "user",
            "message": user_message,
            "reasoning": user_result.structured_output.reasoning
        })

    # Map to session
    mapper = StrandsInMemorySessionMapper()
    session = mapper.map_to_session(all_spans, session_id=case.session_id)

    return {
        "output": agent_message,
        "trajectory": session,
        "conversation_history": conversation_history
    }

# Create diverse test cases
test_cases = [
    Case(
        name="order-issue",
        input="My order #12345 hasn't arrived and it's been 2 weeks",
        metadata={
            "category": "order_tracking",
            "task_description": "Order status checked, issue resolved, customer satisfied"
        }
    ),
    Case(
        name="product-return",
        input="I want to return a product that doesn't fit",
        metadata={
            "category": "returns",
            "task_description": "Return initiated, return label provided, customer satisfied"
        }
    ),
    Case(
        name="billing-question",
        input="I was charged twice for my last order",
        metadata={
            "category": "billing",
            "task_description": "Billing issue identified, refund processed, customer satisfied"
        }
    )
]

# Run evaluation with multiple evaluators
evaluators = [
    HelpfulnessEvaluator(),
    GoalSuccessRateEvaluator()
]

experiment = Experiment(cases=test_cases, evaluators=evaluators)
reports = experiment.run_evaluations(customer_service_task)

# Display results
for report in reports:
    print(f"\n{'='*60}")
    print(f"Evaluator: {report.evaluator_name}")
    print(f"{'='*60}")
    report.run_display()

Best Practices

1. Clear Task Descriptions

# Good: Specific, measurable goal
case = Case(
    input="I need to book a flight",
    metadata={
        "task_description": "Flight booked with confirmation number, dates confirmed, payment processed"
    }
)

# Less effective: Vague goal
case = Case(
    input="I need to book a flight",
    metadata={"task_description": "Help with booking"}
)

2. Appropriate Turn Limits

# Simple queries: 3-5 turns
user_sim = ActorSimulator.from_case_for_user_simulator(
    case=simple_case,
    max_turns=5
)

# Complex tasks: 8-15 turns
user_sim = ActorSimulator.from_case_for_user_simulator(
    case=complex_case,
    max_turns=12
)

3. Clear Span Collection

# Always clear before agent calls to avoid capturing simulator traces
while user_sim.has_next():
    memory_exporter.clear()  # Clear simulator traces
    agent_response = agent(user_message)
    turn_spans = list(memory_exporter.get_finished_spans())  # Only agent spans
    all_spans.extend(turn_spans)
    user_result = user_sim.act(str(agent_response))
    user_message = str(user_result.structured_output.message)

4. Conversation Logging

# Log conversations for analysis
conversation_log = []

while user_sim.has_next():
    agent_response = agent(user_message)
    agent_message = str(agent_response)

    user_result = user_sim.act(agent_message)
    user_message = str(user_result.structured_output.message)

    conversation_log.append({
        "turn": len(conversation_log) // 2 + 1,
        "agent": agent_message,
        "user": user_message,
        "user_reasoning": user_result.structured_output.reasoning
    })

# Save for review
import json
with open("conversation_log.json", "w") as f:
    json.dump(conversation_log, f, indent=2)

Common Patterns

Pattern 1: Goal Completion Testing

def test_goal_completion(case: Case) -> bool:
    user_sim = ActorSimulator.from_case_for_user_simulator(case=case)
    agent = Agent(system_prompt="Your agent prompt")

    user_message = case.input
    goal_completed = False

    while user_sim.has_next():
        agent_response = agent(user_message)
        user_result = user_sim.act(str(agent_response))
        user_message = str(user_result.structured_output.message)

        # Check for stop token
        if "<stop/>" in user_message:
            goal_completed = True
            break

    return goal_completed

Pattern 2: Multi-Evaluator Assessment

def comprehensive_evaluation(case: Case) -> dict:
    # ... run conversation with simulator ...

    return {
        "output": final_message,
        "trajectory": session,
        "turns_taken": turn_count,
        "goal_completed": "<stop/>" in last_user_message
    }

evaluators = [
    HelpfulnessEvaluator(),
    GoalSuccessRateEvaluator(),
    FaithfulnessEvaluator()
]

experiment = Experiment(cases=cases, evaluators=evaluators)
reports = experiment.run_evaluations(comprehensive_evaluation)

Pattern 3: Conversation Analysis

def analyze_conversation(case: Case) -> dict:
    user_sim = ActorSimulator.from_case_for_user_simulator(case=case)
    agent = Agent(system_prompt="Your prompt")

    metrics = {
        "turns": 0,
        "agent_messages": [],
        "user_messages": [],
        "user_reasoning": []
    }

    user_message = case.input
    while user_sim.has_next():
        agent_response = agent(user_message)
        agent_message = str(agent_response)
        metrics["agent_messages"].append(agent_message)

        user_result = user_sim.act(agent_message)
        user_message = str(user_result.structured_output.message)
        metrics["user_messages"].append(user_message)
        metrics["user_reasoning"].append(user_result.structured_output.reasoning)
        metrics["turns"] += 1

    return metrics

Troubleshooting

Issue: Simulator Stops Too Early

Solution: Increase max_turns or check task_description clarity

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    max_turns=15  # Increase limit
)

Issue: Simulator Doesn't Stop

Solution: Ensure task_description is achievable and clear

# Make goal specific and achievable
case = Case(
    input="I need help",
    metadata={
        "task_description": "Specific, measurable goal that can be completed"
    }
)

Issue: Unrealistic Responses

Solution: Use custom profile or adjust system prompt

custom_prompt = """
You are simulating a realistic user with: {actor_profile}

Be natural and human-like:
- Don't be overly formal
- Ask follow-up questions naturally
- Express emotions appropriately
- Include <stop/> only when truly satisfied
"""

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    system_prompt_template=custom_prompt
)

Issue: Capturing Simulator Traces

Solution: Always clear exporter before agent calls

while user_sim.has_next():
    memory_exporter.clear()  # Critical: clear before agent call
    agent_response = agent(user_message)
    spans = list(memory_exporter.get_finished_spans())
    # ... rest of logic ...