MLOps for LLMs
From dev to prod: CI/CD, telemetry, rollback, and governance.
Lifecycle
| Stage | Focus | Notes |
|---|---|---|
| Dev | Prompt/model iteration | Prompt registry, unit prompts, golden data |
| Pre-Prod | Offline/online eval | A/B in sandbox, canaries |
| Prod | Reliability & cost | Budgets, SLOs, autoscaling, caching |
| Ops | Monitoring | Latency, errors, safe output rate, drift |
| Gov | Risk & audit | Change mgmt, model cards, data lineage |
Observability
- Trace each request (retrieval → LLM → post-proc)
- Log prompts, model versions, embeddings, costs
- SLO alerts for tail latency & failure spikes
Cost Controls
- Cache, short prompts, small models for easy paths
- Route only hard queries to bigger models
- Batch background jobs; cap max tokens