How to reduce hallucinations when using LLMs for data analysis tasks?

Asked about 2 months agoViewed 204 times
14

I'm building a system where users ask questions about their data in natural language, and an LLM generates SQL queries and interprets results.

The problem: The LLM often "hallucinates" insights that aren't supported by the actual data.

Example:

  • User asks: "What's our top-selling product?"
  • LLM correctly generates SQL
  • But then adds: "This is likely due to the recent marketing campaign" (we had no such campaign)

What I've tried:

  1. Strict system prompts saying "only state facts from the data"
  2. Few-shot examples of good vs bad responses
  3. Temperature = 0 for deterministic output

Still getting hallucinations. How do production data analysis tools handle this? Should I use a separate verification step?

asked about 2 months ago

Comments

M

Another approach: use a smaller, fine-tuned model for verification. It's faster and cheaper than using GPT-4 twice.

Marcus Johnson2120about 2 months ago

Please log in to add a comment

Log In

1 Answer

17

Hallucinations in data analysis are particularly dangerous. Here's a multi-layer approach:

Layer 1: Constrained Generation Force the model to cite data:

System: You are a data analyst. For every claim, cite the specific data point.
Format: [CLAIM] (Source: [TABLE.COLUMN])

Example:
"Revenue increased 23% (Source: sales.monthly_revenue)"

Layer 2: Verification Step Add a separate verification agent:

analysis = analyst_llm.generate(query, data)
verification = verifier_llm.check(analysis, data)
if verification.has_unsupported_claims:
    analysis = remove_unsupported_claims(analysis)

Layer 3: Confidence Scores Ask the model to rate confidence:

For each insight, provide:
1. The insight
2. Supporting data
3. Confidence (0-100%)

Filter out low-confidence claims.

Layer 4: Human-in-the-Loop For production, show:

  • ✅ Facts directly from data (green)
  • ⚠️ Inferences/interpretations (yellow, with "AI interpretation" label)
  • ❌ Unverified claims (filtered out)

This approach reduced our hallucination rate from ~30% to <5%.

answered about 2 months ago

Comments

No comments yet. Be the first to comment!

Please log in to add a comment

Log In

Sign in to post an answer

Sign In