Context that knows
what matters.
Retrieval runs automatically inside the engine on every intent call — you never invoke it directly. Hybrid RAG pipelines with semantic chunking, metadata filtering, cross-encoder re-ranking, and multi-tier memory. The right context, every time.
curl -X POST https://api.liyaengine.com/v1/retrieval/query \
-H "x-api-key: $LIYA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "FHIR-compliant medication reconciliation requirements",
"sources": ["clinical-policies-v3", "cms-guidelines-2026"],
"options": {
"mode": "hybrid",
"top_k": 8,
"rerank": true,
"filters": {
"document_type": "policy",
"effective_after": "2025-01-01"
}
}
}'
# {
# "passages": [...], // re-ranked context passages
# "sources": [...], // provenance for each passage
# "latency": { "vector_ms": 18, "rerank_ms": 22 }
# }Beyond naive chunking
Semantic Chunking
Documents are split at natural content boundaries — section headers, paragraphs, list items — not fixed character offsets. Coherent chunks produce far better retrieval than naive splitting.
Hybrid Search
Dense vector search and BM25 keyword search run in parallel. Results are combined via reciprocal rank fusion, giving you semantic similarity and exact-term precision simultaneously.
Metadata Filtering
Apply structured filters before vector search to narrow the candidate set — by document version, date range, source, tag, or any custom attribute. Cuts latency and improves precision.
Cross-Encoder Re-Ranking
After initial retrieval, a cross-encoder model re-ranks the top-20 candidates against the query. Consistently improves end-to-end answer quality for knowledge-intensive tasks.
Bring Your Own Vector Store
Connect Pinecone, Weaviate, Qdrant, pgvector, or any vector store via the adapter API. Liya Engine handles chunking, embedding, and retrieval orchestration — you own the data layer.
Streaming Retrieval
Retrieve and stream context passages to agents mid-generation. Agents can issue additional retrieval queries without blocking the response stream.
Three tiers of agent memory
Different tasks need different memory strategies. Liya Engine provides three tiers — choose one, or combine all three in a single agent.
A rolling window of recent turns or tokens included in every model call. Fast, zero-latency, ideal for active single-session conversations.
Structured summaries of past sessions — entities, decisions, open questions — retrieved and injected at the start of new sessions. Gives agents memory across days and weeks.
Long-term knowledge in a vector store, retrieved on demand during agent execution. Handles large corpora that can't fit in context. The foundation of RAG-powered agents.