Hybrid RAG API with LangChain, Qdrant & Redis Caching

March 18, 2026

Hybrid RAG API with LangChain, Qdrant & Redis Caching

Retrieval-Augmented Generation (RAG) is only as good as its retrieval. Pure keyword search can miss semantic matches, and pure vector search can miss exact terms, identifiers, and edge-case strings. This API combines both with hybrid retrieval and adds Redis caching so repeated questions return in milliseconds.

Hybrid RAG API screenshot

What it does

  • Upload .txt files and automatically index them for retrieval
  • Split text into chunks (500 chars, 150 overlap)
  • Run hybrid search:
    • BM25 for keyword matching
    • Qdrant for semantic similarity (cosine distance)
  • Use OpenAI embeddings (text-embedding-3-small, 1536 dims)
  • Generate answers with Anthropic Claude
  • Cache Q&A in Redis (24h TTL) with stats and invalidation on re-upload

Why hybrid retrieval

Hybrid retrieval usually improves quality because it can handle:

  • Exact strings (IDs, error messages, filenames, code symbols) via BM25
  • Paraphrases and conceptual matches via vector search
  • Better ranking stability when either approach alone is noisy

The final result list is merged by normalizing scores, applying weights, and selecting the top (k) contexts.

Local setup

Install:

pnpm install

Create .env:

pnpm run setup

Required keys:

# Anthropic API key ANTHROPIC_API_KEY=your_anthropic_api_key_here # OpenAI API key for embeddings OPEN_AI_API_KEY=your_openai_api_key_here

Optional Redis:

REDIS_HOST=localhost REDIS_PORT=6379 REDIS_PASSWORD=your_redis_password_here

Optional Qdrant:

QDRANT_HOST=localhost QDRANT_PORT=6333 QDRANT_COLLECTION_NAME=documents

Hybrid tuning:

HYBRID_SEARCH_BM25_WEIGHT=0.5 HYBRID_SEARCH_VECTOR_WEIGHT=0.5 HYBRID_SEARCH_TOP_K=8

Usage

Dev:

pnpm run dev

Prod:

pnpm run build pnpm start

Test the chain:

pnpm test

API endpoints

  • POST /upload-file: upload a .txt file, index it, and invalidate related cache
  • POST /ask-question: { "filename": "...", "question": "..." }
    • cache hit → return instantly
    • cache miss → hybrid retrieve → Claude → cache result → return
  • GET /cache-stats: Redis cache/memory stats
  • DELETE /delete-cache: clear cached Q&A for a file

How it works (high-level)

Question → Check Redis Cache → Cache Hit? → Return Cached Answer ↓ Cache Miss ↓ ┌───────────────────┐ │ Hybrid Retriever │ └─────────┬─────────┘ │ ┌─────────┴─────────┐ │ │ BM25 Search Vector Search (Keyword) (Semantic) │ │ │ Query → Embeddings │ │ │ Qdrant Vector DB │ │ └─────────┬─────────┘ │ Merge & Rank Results (Normalize + Weight) │ Top K Documents → Claude → Cache → Response

Implementation map

  • src/anthropic/index.ts: RAG chain implementation
  • src/routes/langchain-router.ts: endpoints + Redis integration
  • src/lib/hybrid-retriever.ts: BM25 + vector merging/ranking
  • src/lib/qdrant.ts: Qdrant service
  • src/lib/embeddings.ts: OpenAI embeddings
  • src/lib/redis.ts: caching layer (TTL, keying, stats)
  • src/lib/storage.ts: file storage utilities
  • src/constants.ts: config knobs (Redis/Qdrant/hybrid weights)
GitHub
LinkedIn