Skip to content

Root Cause Analysis

analyze_root_cause performs deep causal analysis of detected failures in an agent execution session. It traces failure chains, classifies causality (primary vs. secondary vs. tertiary), assesses propagation impact, and produces actionable fix recommendations — telling you not just what failed, but why and how to fix it.

  • Causal chain analysis: Distinguishes between root causes and their downstream effects
  • Propagation impact assessment: Determines whether failures caused task termination, quality degradation, incorrect paths, or were contained
  • Fix recommendations: Classifies fixes as system prompt changes, tool description updates, or other infrastructure fixes
  • 3-tier fallback strategy: Handles large sessions via direct analysis, failure path pruning, and chunked analysis with merge
  • Automatic failure detection: If no failures are provided, calls detect_failures automatically

Use analyze_root_cause when you need to:

  • Understand causal relationships between failures in a session
  • Get fix recommendations for detected failures
  • Determine propagation impact — did the failure cascade or stay contained?
  • Prioritize fixes based on causality (fix primary failures first)

For a combined detect-and-analyze pipeline, use diagnose_session instead.

  • Type: Session
  • Description: The Session object containing traces and spans to analyze.
  • Type: list[FailureItem] | None
  • Default: None
  • Description: List of failures from detect_failures(). If None, detect_failures() is called automatically. Pass this explicitly when you’ve already run failure detection to avoid duplicate work.
  • Type: Model | str | None
  • Default: None (uses Claude Sonnet via Bedrock)
  • Description: The model to use for analysis. Can be a Model instance, a Bedrock model ID string, or None for the default.
from strands_evals.detectors import detect_failures, analyze_root_cause, ConfidenceLevel
# Step 1: Detect failures
failure_output = detect_failures(session, confidence_threshold=ConfidenceLevel.MEDIUM)
# Step 2: Analyze root causes (pass failures to avoid re-detection)
rca_output = analyze_root_cause(session, failures=failure_output.failures)
for rc in rca_output.root_causes:
print(f"Failure span: {rc.failure_span_id}")
print(f" Root cause at: {rc.location}")
print(f" Causality: {rc.causality}")
print(f" Impact: {rc.propagation_impact}")
print(f" Explanation: {rc.root_cause_explanation}")
print(f" Fix type: {rc.fix_type}")
print(f" Recommendation: {rc.fix_recommendation}")

If you don’t provide failures, analyze_root_cause calls detect_failures internally:

from strands_evals.detectors import analyze_root_cause
# Automatically detects failures first, then analyzes root causes
rca_output = analyze_root_cause(session)

This is convenient for one-off analysis but means failure detection runs with default settings (confidence_threshold=ConfidenceLevel.LOW). For more control, detect failures separately.

analyze_root_cause returns an RCAOutput:

class RCAOutput(BaseModel):
root_causes: list[RCAItem]
class RCAItem(BaseModel):
failure_span_id: str # The failure span this explains
location: str # Span where root cause originated
causality: str # PRIMARY_FAILURE | SECONDARY_FAILURE | TERTIARY_FAILURE
propagation_impact: list[str] # Impact types (see table below)
failure_detection_timing: str # When failure was detected in execution
completion_status: str # Overall task completion status
root_cause_explanation: str
fix_type: str # SYSTEM_PROMPT_FIX | TOOL_DESCRIPTION_FIX | OTHERS
fix_recommendation: str
ValueMeaning
PRIMARY_FAILUREOriginal source of the problem, independent of other failures
SECONDARY_FAILUREDirect consequence of a primary failure
TERTIARY_FAILUREDownstream effect of a secondary failure
UNCLEARInsufficient context to determine causality
ValueMeaning
TASK_TERMINATIONComplete task failure, execution cannot continue
QUALITY_DEGRADATIONTask completes but with reduced output quality
INCORRECT_PATHForces fundamentally different strategy
STATE_CORRUPTIONAgent develops incorrect understanding of state
NO_PROPAGATIONContained failure, recovered within 1-2 turns
UNCLEARCannot determine impact
ValueMeaning
IMMEDIATELY_AT_OCCURRENCEFailure was detected as soon as it happened
SEVERAL_STEPS_LATERFailure was detected after a few more steps
ONLY_AT_TASK_ENDFailure was only apparent when the task completed
SILENT_UNDETECTEDFailure went undetected during execution
ValueMeaning
COMPLETE_SUCCESSTask completed successfully despite the failure
PARTIAL_SUCCESSTask partially completed
COMPLETE_FAILURETask failed entirely
ValueWhen to use
SYSTEM_PROMPT_FIXAgent behavior issues, missing guidelines, incorrect reasoning patterns
TOOL_DESCRIPTION_FIXTool parameter confusion, unclear capabilities, missing constraint documentation
OTHERSTool implementation bugs, API errors, infrastructure issues

Root cause analysis requires understanding the full causal context of failures, which can be challenging for large sessions. The analyzer uses three progressively more aggressive strategies:

The full session and failures are sent to the LLM in a single call. This produces the highest quality results because the model sees the complete execution context.

If the session exceeds context limits, the analyzer prunes the session to keep only spans on failure paths:

  • Ancestors: All spans from root to each failure span (the causal chain)
  • Descendants: Up to 10 child spans per failure (the downstream context)

This typically reduces session size by 50-90% while preserving the information needed for causal analysis.

If the pruned session still exceeds context limits, it is split into per-trace windows:

  1. Each window is analyzed independently
  2. Results from all windows are merged using a dedicated merge prompt that deduplicates and reconciles findings
from strands_evals.providers import CloudWatchProvider
from strands_evals.detectors import detect_failures, analyze_root_cause, ConfidenceLevel
# Fetch a trace from CloudWatch
provider = CloudWatchProvider(agent_name="booking-agent", region="us-east-1")
data = provider.get_evaluation_data(session_id="session-456")
# Detect and analyze
failures = detect_failures(data.trajectory, confidence_threshold=ConfidenceLevel.MEDIUM)
rca = analyze_root_cause(data.trajectory, failures=failures.failures)
# Group recommendations by fix type
from collections import defaultdict
by_type = defaultdict(list)
for rc in rca.root_causes:
by_type[rc.fix_type].append(rc.fix_recommendation)
for fix_type, recs in by_type.items():
print(f"\n{fix_type}:")
for rec in recs:
print(f" - {rec}")
  1. Pass failures explicitly when you’ve already run detect_failures — avoids redundant LLM calls
  2. Use ConfidenceLevel.MEDIUM for failure detection before RCA to reduce noise in root cause analysis
  3. Fix primary failures first — secondary and tertiary failures often resolve when their root cause is addressed
  4. Group recommendations by fix type to batch related changes (e.g., all system prompt fixes together)
  5. Use diagnose_session when you want the full pipeline in a single call