ResearchJanuary 29, 2026· 12 min read

RAG Pipeline Architecture: Beyond Naive Chunking

Semantic chunking, metadata filtering, hybrid search, and re-ranking — the retrieval stack that powers Liya Engine's knowledge layer.

R

Research

Liya Research

Most RAG tutorials show you the same thing: split your documents into fixed-size chunks, embed them, store in a vector database, retrieve the top-k on query. This works in demos. It fails in production.

The problem is that real documents are structurally complex. Legal contracts have defined sections. Technical manuals have nested hierarchies. Financial filings have tables, footnotes, and cross-references. Fixed-size chunking destroys this structure and produces low-quality retrieval.

Semantic chunking

Instead of splitting at character boundaries, semantic chunking identifies natural content boundaries in the document. Section headers, paragraph breaks, list items, and table cells are treated as chunk boundaries. This produces chunks that contain coherent, self-contained pieces of information.

Metadata filtering

Before vector search, we apply metadata filters to reduce the candidate set. For a question about a specific policy version, we filter to chunks from that document version before running the similarity search. This dramatically improves both precision and latency.

Hybrid search

Dense vector search is good at semantic similarity but poor at exact term matching. For queries containing specific identifiers, product names, or technical terms, BM25 keyword search often outperforms embedding-based retrieval. Our pipeline runs both in parallel and combines results using reciprocal rank fusion.

Re-ranking

After the initial retrieval pass, we run a cross-encoder re-ranker over the top-20 candidates to produce a final top-k. Cross-encoders are too slow to run over the full corpus, but excellent at discriminating between the near-misses that dense retrieval produces. This step consistently improves end-to-end answer quality.

Every improvement in retrieval quality directly improves answer quality. It's the highest-leverage layer of the RAG stack to optimise.

Related

Research10 min read

Memory Strategies for Long-Running Agents

Engineering9 min read

Designing Multi-Agent Workflows That Actually Work