Technical

RAG Best Practices: Reducing Hallucinations

EngineeringJanuary 5, 202612 min read

RAG (Retrieval-Augmented Generation) combines the knowledge of your data with the reasoning of LLMs. But without careful implementation, you can still get hallucinations.

The Problem

LLMs are trained to always provide an answer. Even when they don't know, they'll confidently make something up.

The Solution: Grounded Generation

1. Retrieve First, Then Generate

Always retrieve relevant context before generating:

// Step 1: Get relevant memories

const context = await client.search.semantic({

query: userQuestion,

limit: 5,

});

// Step 2: Pass to LLM with strict instructions

const prompt = `

Answer based ONLY on the following context.

If the answer isn't in the context, say "I don't know."

Context:

${context.results.map(r => r.content).join('\n')}

Question: ${userQuestion}

`;

2. Include Source Citations

PiyAPI's TruthMeter helps track which sources support the answer:

const response = await client.context.retrieve({

query: userQuestion,

includeCitations: true,

});

// response.citations = [

// { memoryId: "mem_123", relevance: 0.92, snippet: "..." },

// { memoryId: "mem_456", relevance: 0.87, snippet: "..." }

// ]

3. Set Confidence Thresholds

Don't use low-relevance results:

const context = await client.search.semantic({

query: userQuestion,

limit: 10,

minRelevance: 0.7, // Only high-confidence matches

});

Measuring Hallucinations

Track the "grounding rate" - percentage of claims that can be traced to source documents. Aim for >95%.

AI
Memory
Technical

Related Posts

Product

Introducing PiyAPI v2.0: The Memory Layer for AI

Today we're announcing PiyAPI v2.0 with hybrid search, knowledge graphs, and enhanced PHI protection.

Read more
Technical

Semantic Search Explained: Beyond Keyword Matching

How vector embeddings enable meaning-based search and why it matters for AI applications.

Read more

Ready to build with PiyAPI?

Start adding intelligent memory to your AI applications today.