How to implement RAG (Retrieval-Augmented Generation) with custom embeddings?

Question

I want to build a RAG system for our internal documentation, but I'm confused about the embedding strategy.

**Current setup:**
- 500+ markdown documentation files
- Using OpenAI's text-embedding-3-small
- Storing in Pinecone vector database

**Questions:**
1. Should I fine-tune embeddings on our domain-specific content?
2. What chunk size works best for technical documentation?
3. How do I handle code snippets vs prose differently?
4. What's the best way to re-rank retrieved chunks before sending to LLM?

I've seen some teams use hybrid search (keyword + semantic). Is that worth the added complexity?

How to implement RAG (Retrieval-Augmented Generation) with custom embeddings?

Comments

0 Answers