Agent Evals and Guardrails: Practical Notes
I no longer treat evals/guardrails as platform extras. For agentic products, they are part of core feature quality.
Working model
- Evals check capability and correctness.
- Guardrails enforce policy boundaries at runtime.
One without the other creates failure modes.
Evals: what I want covered
- normal tasks
- edge cases
- adversarial/prompt-injection style cases
- regressions from real incidents
If a scenario can break in production, it should exist in evals.
Guardrails: what I enforce
- tool allow/deny policies
- scoped data access
- destination/action restrictions
- approval gates for high-risk actions
- full audit trail
Runtime shape that works
- Pre-check policy layer
- Agent execution layer
- Post-check validation layer
- Human escalation layer
- Feedback to eval set
Reliability improves when production failures feed back into test cases.
Trend signals behind this note
- OpenAI launched AgentKit on October 6, 2025 with first-party eval/trace/guardrail patterns: Introducing AgentKit.
- Stack Overflow 2025 data shows broad AI adoption while confidence/trust still lag in many workflows: AI section, 2025 survey.
Sticky takeaway
If agents can take actions in production, eval and policy systems are part of the product, not optional infrastructure.