Retrieval-Augmented Generation
Ground LLM responses in trusted data sources to reduce hallucinations and add citations.
What & Why
Goal
Answer with accuracy and sources using your private knowledge.
When
Policies, product docs, FAQs, tickets, wikis, PDFs, SQL.
Benefits
Lower hallucinations, fresher answers, explainability, compliance.
Reference Architecture
- Ingest & chunk documents (semantic-friendly sizes).
- Embed & index (vector DB + metadata filters).
- Retrieve top-k by similarity + hybrid (keyword).
- Rerank (optional) for quality.
- Compose prompt with citations & constraints.
- Generate with LLM; return sources.
Implementation Notes
- Chunking: 400–1000 tokens with overlap for context continuity.
- Metadata: doc_id, section, date, access_level, language.
- Filters: by product, date range, audience, regulatory tag.
- Evaluation: answer correctness, groundedness, citation match.
Prompt Skeleton (server-side)
System: You answer using only the provided context. If the answer isn't present,
say "I don't know" and suggest where to find it. Cite sources as [#].
User question: {{ user_question }}
Context:
{{ top_chunks_with_ids }}
Answer concisely with numbered citations.
Quality & Guardrails
- Block answers when no relevant chunks (low similarity).
- Return “no answer” with escalation path.
- Log retrievals & model calls for audits.