Evaluation, Safety & Guardrails
Measure quality, reduce risk, and ship responsibly.
What to Measure
| Dimension | Examples |
|---|---|
| Correctness | Answer matches reference, math checks |
| Groundedness | Citations support claims (RAG) |
| Safety | Policy adherence, PII redaction |
| UX | Helpfulness, tone, formatting |
| Cost/Latency | Tokens, cold starts, tail latency |
Techniques
- Golden sets & unit prompts
- Auto-grading (LLM-as-judge) with spot human review
- A/B in pre-prod sandboxes before rollout
- Canary deploys + fast rollback
Safety Rails
- Input/Output policy checks and refusal scaffolds
- PII filters & allow-lists for tool usage
- Moderation logs, signatures, and trace IDs