Evaluation, Safety & Guardrails

Measure quality, reduce risk, and ship responsibly.

What to Measure

Dimension	Examples
Correctness	Answer matches reference, math checks
Groundedness	Citations support claims (RAG)
Safety	Policy adherence, PII redaction
UX	Helpfulness, tone, formatting
Cost/Latency	Tokens, cold starts, tail latency

Techniques

Golden sets & unit prompts
Auto-grading (LLM-as-judge) with spot human review
A/B in pre-prod sandboxes before rollout
Canary deploys + fast rollback

Safety Rails

Input/Output policy checks and refusal scaffolds
PII filters & allow-lists for tool usage
Moderation logs, signatures, and trace IDs