RAG Best Practices: Reducing Hallucinations
RAG (Retrieval-Augmented Generation) combines the knowledge of your data with the reasoning of LLMs. But without careful implementation, you can still get hallucinations.
The Problem
LLMs are trained to always provide an answer. Even when they don't know, they'll confidently make something up.
The Solution: Grounded Generation
1. Retrieve First, Then Generate
Always retrieve relevant context before generating:
// Step 1: Get relevant memories
const context = await client.search.semantic({
query: userQuestion,
limit: 5,
});
// Step 2: Pass to LLM with strict instructions
const prompt = `
Answer based ONLY on the following context.
If the answer isn't in the context, say "I don't know."
Context:
${context.results.map(r => r.content).join('\n')}
Question: ${userQuestion}
`;
2. Include Source Citations
PiyAPI's TruthMeter helps track which sources support the answer:
const response = await client.context.retrieve({
query: userQuestion,
includeCitations: true,
});
// response.citations = [
// { memoryId: "mem_123", relevance: 0.92, snippet: "..." },
// { memoryId: "mem_456", relevance: 0.87, snippet: "..." }
// ]
3. Set Confidence Thresholds
Don't use low-relevance results:
const context = await client.search.semantic({
query: userQuestion,
limit: 10,
minRelevance: 0.7, // Only high-confidence matches
});
Measuring Hallucinations
Track the "grounding rate" - percentage of claims that can be traced to source documents. Aim for >95%.