Guardrails¶
Strands Agents SDK provides seamless integration with guardrails, enabling you to implement content filtering, topic blocking, PII protection, and other safety measures in your AI applications.
What Are Guardrails?¶
Guardrails are safety mechanisms that help control AI system behavior by defining boundaries for content generation and interaction. They act as protective layers that:
- Filter harmful or inappropriate content - Block toxicity, profanity, hate speech, etc.
- Protect sensitive information - Detect and redact PII (Personally Identifiable Information)
- Enforce topic boundaries - Prevent responses on custom disallowed topics outside of the domain of an AI agent, allowing AI systems to be tailored for specific use cases or audiences
- Ensure response quality - Maintain adherence to guidelines and policies
- Enable compliance - Help meet regulatory requirements for AI systems
- Enforce trust - Build user confidence by delivering appropriate, reliable responses
- Manage Risk - Reduce legal and reputational risks associated with AI deployment
Guardrails in Different Model Providers¶
Strands Agents SDK allows integration with different model providers, which implement guardrails differently.
Amazon Bedrock¶
Amazon Bedrock provides a built-in guardrails framework that integrates directly with Strands Agents SDK. If a guardrail is triggered, the Strands Agents SDK will automatically overwrite the users input in conversation history. This is done so that follow-up questions are not also blocked by the same questions. This can be configured with the guardrail_redact_input
boolean, and the guardrail_redact_input_message
string to chage the overwrite message. Additionally, the same functionality is built for the model's output, but this is disabled by default. You can enable this with the guardrail_redact_output
boolean, and change the overwrite message with the guardrail_redact_output_message
string. Below is an example of how to leverage Bedrock guardrails in your code:
import json
from strands import Agent
from strands.models import BedrockModel
# Create a Bedrock model with guardrail configuration
bedrock_model = BedrockModel(
model_id="anthropic.claude-3-5-sonnet-20241022-v2:0",
guardrail_id="your-guardrail-id", # Your Bedrock guardrail ID
guardrail_version="1", # Guardrail version
guardrail_trace="enabled", # Enable trace info for debugging
)
# Create agent with the guardrail-protected model
agent = Agent(
system_prompt="You are a helpful assistant."
)
# Use the protected agent for conversations
response = agent("Tell me about financial planning.")
# Handle potential guardrail interventions
if response.stop_reason == "guardrail_intervened":
print("Content was blocked by guardrails, conversation context overwritten!")
print(f"Conversation: {json.dumps(agent.messages, indent=4)}")
Ollama¶
Ollama doesn't currently provide native guardrail capabilities like Bedrock. Instead, Strands Agents SDK users implementing Ollama models can use the following approaches to guardrail LLM behavior:
- System prompt engineering with safety instructions (see the Prompt Engineering section of our documentation)
- Temperature and sampling controls
- Custom pre/post processing with Python tools
- Response filtering using pattern matching