⚡ Free Classes and Scholarships Available for Underprivileged Students -

Evaluation, Safety & Guardrails

Measure quality, reduce risk, and ship responsibly.

What to Measure

DimensionExamples
CorrectnessAnswer matches reference, math checks
GroundednessCitations support claims (RAG)
SafetyPolicy adherence, PII redaction
UXHelpfulness, tone, formatting
Cost/LatencyTokens, cold starts, tail latency

Techniques

  • Golden sets & unit prompts
  • Auto-grading (LLM-as-judge) with spot human review
  • A/B in pre-prod sandboxes before rollout
  • Canary deploys + fast rollback

Safety Rails

  • Input/Output policy checks and refusal scaffolds
  • PII filters & allow-lists for tool usage
  • Moderation logs, signatures, and trace IDs