How to implement RAG (Retrieval-Augmented Generation) with custom embeddings?

Asked about 2 months agoViewed 174 times
17

I want to build a RAG system for our internal documentation, but I'm confused about the embedding strategy.

Current setup:

  • 500+ markdown documentation files
  • Using OpenAI's text-embedding-3-small
  • Storing in Pinecone vector database

Questions:

  1. Should I fine-tune embeddings on our domain-specific content?
  2. What chunk size works best for technical documentation?
  3. How do I handle code snippets vs prose differently?
  4. What's the best way to re-rank retrieved chunks before sending to LLM?

I've seen some teams use hybrid search (keyword + semantic). Is that worth the added complexity?

asked about 2 months ago

Comments

L

Have you considered using late chunking? It can improve retrieval quality significantly.

Lisa Kim1890about 2 months ago

Please log in to add a comment

Log In

1 Answer

14

I've built several RAG systems. Here's what works:

1. Chunk Size For technical docs, I recommend:

  • Prose: 512-1024 tokens with 128 token overlap
  • Code: Keep functions/classes intact (don't split mid-function)
  • Tables: Treat as atomic units

2. Hybrid Search Yes, it's worth it! Combine:

  • Semantic search: For conceptual queries ("how to handle errors")
  • Keyword search: For exact matches (function names, error codes)
semantic_results = vector_db.search(query_embedding, top_k=20)
keyword_results = bm25.search(query, top_k=20)
combined = rerank(semantic_results + keyword_results, top_k=5)

3. Re-ranking Use a cross-encoder for re-ranking:

from sentence_transformers import CrossEncoder
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
scores = reranker.predict([(query, doc) for doc in candidates])

4. Fine-tuning Embeddings Only if you have >10k domain-specific examples. Otherwise, the generic embeddings work surprisingly well.

Pro tip: Add metadata filters (document type, date, author) to improve retrieval precision.

answered about 2 months ago

Comments

E

The verification step is brilliant! Have you open-sourced this pattern anywhere?

Emma Thompson2340about 2 months ago

Please log in to add a comment

Log In

Sign in to post an answer

Sign In