1. Introduction: Single Path Not Enough – How Multi-Path Recall Solves the “Lopsided” Retrieval Problem?

After deploying an RAG (Retrieval-Augmented Generation) system, have you ever encountered a scenario where a user asks “How should hypertension patients adjust their diet?” and the system returns an article titled “How to Make a Low-Fat Salad”? You might instinctively blame the large language model, but more often than not, the root cause lies in the retrieval stage – a phenomenon we call “lopsided” retrieval.

Imagine your knowledge base is a vast library, and the traditional single retrieval method is like hiring a librarian who only knows one way to find books. They either only look at book titles (keyword-based sparse retrieval, e.g., BM25) or only sense semantic “feeling” (vector-based dense retrieval). When the user’s query falls outside their expertise, the results naturally go off track.

  • Dense Retrieval’s “Lopsidedness”: It excels at capturing semantic similarity. It understands that “dietary advice for hypertension” and “low-sodium diet can lower blood pressure” are related. However, it is insensitive to precise keyword matching. For example, if a user looks for a document titled “Hypertension_Guidelines_2024.pdf”, the keyword “2024” might have low weight and be ranked lower. Worse, when a query is completely novel and has no highly similar phrasing in the corpus, dense retrieval performance drops sharply – this is the infamous out-of-domain generalization problem.

  • Sparse Retrieval’s “Lopsidedness”: It acts like a strict literal matcher, precisely matching keywords like “hypertension” and “diet”. But it completely ignores semantics and cannot understand the logical relationship between “high sodium intake” and “elevated blood pressure”. So when a user asks “How to improve cardiovascular health?”, it may fail to relate to documents about “low-salt diet”.

These inherent flaws of single retrieval strategies gave birth to the concept of RAG multi-path recall. The core idea is simple: don’t put all your eggs in one basket. We hire multiple librarians, each with their own strengths, to search from different perspectives (semantic, keyword, hypothetical document, knowledge graph, etc.), then fuse and filter their results to produce a more comprehensive and accurate set.

This is like “evidence fusion” in court: a single eyewitness testimony might be unreliable, but combining physical evidence, documentary evidence, electronic data, etc., makes the truth much clearer and more credible. Multi-path recall and result fusion is precisely about building this “multi-chain of evidence” toward high-precision retrieval.

In this article, you will no longer be a developer using only a single retrieval tool. You will systematically grasp:

  1. Deep understanding: Why single retrieval (dense or sparse) is always “lopsided”, and how multi-path recall solves this by “collaborative combat”.
  2. Core algorithm: Thoroughly understand the star algorithm for result fusion – RRF (Reciprocal Rank Fusion) – and why it so cleverly balances the ranking confidence of different recall sources.
  3. Practical code: Master how to implement a complete, working multi-path recall and RRF fusion module in Python, including the deduplication and sorting details you care about.
  4. Advanced tuning: Learn how to adjust fusion strategies for different business scenarios, including weight allocation, parameter tuning, and secondary re-ranking, making your RAG system smarter.

Ready to start this practical journey of retrieval optimization? This article will thoroughly end your retrieval “lopsidedness” troubles and take you to the next stage of improving RAG knowledge base recall.

2. Core Concepts: What is Multi-Path Recall and Result Fusion?

To understand RAG multi-path recall strategy, we must first break it down into two core actions: multi-path recall and result fusion.

2.1 Multi-Path Recall: From “fighting alone” to “surrounding from all sides”

Multi-path recall is defined as: in the online retrieval phase of an RAG system, instead of using only one retrieval strategy, we simultaneously launch multiple retrieval engines that work from different angles. Each engine independently searches the entire knowledge base and produces its own candidate document list.

These “paths” can include but are not limited to:

  1. Dense Vector Retrieval: The most mainstream method. It uses a powerful Embedding model (e.g., bge-large, text-embedding-ada-002) to convert both the user’s query and all document chunks into high-dimensional vectors. Then it computes similarity (e.g., cosine similarity) via a vector database (e.g., FAISS, Milvus) to find the top-K documents closest in the semantic space. Its strength lies in understanding complex semantic relationships like synonyms and paraphrasing.

  2. Sparse Vector Retrieval: The workhorse of traditional information retrieval, the most famous representative being the BM25 algorithm. It calculates the matching degree between query and document based on term frequency (TF) and inverse document frequency (IDF). Its advantages are high sensitivity to exact keyword matches, almost no need for pre-trained models, low deployment cost, and stable performance in niche or domain-specific term scenarios. Sparse-dense vector fusion recall aims to combine the strengths of both, making retrieval understand both semantics and keywords.

  3. HyDE (Hypothetical Document Embeddings): A clever strategy. Instead of directly using the user’s query for retrieval, it first asks an LLM to generate a hypothetical, ideal document that best matches the user’s intent. The system then vectorizes this “fictional” document and uses it to retrieve real documents from the knowledge base. This “intermediary” strategy effectively bridges the semantic gap between a short user query and complex documents.

  4. Knowledge Graph-based Graph Retrieval: If your knowledge base has been built into a knowledge graph (recording relationships between entities, e.g., “Hypertension Guidelines – [Author] – [Zhang San]”), you can use a graph database for multi-hop reasoning. For example, querying “books about hypertension written by Zhang San” – graph retrieval can follow the path “Zhang San” -> “Author” -> “Hypertension Guidelines” to precisely find the target. This is very effective in scenarios requiring context or relational reasoning.

Tip: Multi-path recall is not necessarily better with more paths. When choosing recall paths, you need to trade off computational cost against benefit. For most general scenarios, the “dense + sparse” two-path combination usually brings significant improvement in RAG knowledge base recall. For professional domains or scenarios requiring complex reasoning, adding HyDE or graph retrieval is more suitable.

2.2 Result Fusion: From “clamoring voices” to “reaching consensus”

After we dispatch multiple “detectives” in parallel, each brings back their own list of “suspicious documents” (candidate results). The problem then becomes: whose results are more trustworthy? They might point to the same target or have completely different rankings. Result fusion is designed to solve this.

The goal of result fusion is to integrate multiple heterogeneous, possibly contradictory or overlapping candidate lists into a single, high-quality, ranked final result list. This process addresses several key issues:

  1. Deduplication: This is a prerequisite. Different recall paths are very likely to retrieve the same document. For instance, a chunk about “low-sodium diet can lower blood pressure” could be hit by both dense and sparse retrieval. Without deduplication, this content would be counted multiple times in the final results, wasting the large model’s precious context window (Tokens) and causing the answer to deviate from the point. This is exactly the multi-path recall deduplication method.

  2. Normalization and Alignment: The scores produced by different retrieval strategies are usually incomparable. Dense retrieval output might be cosine similarity (0 to 1), while BM25 output might be an integer from tens to hundreds. We cannot simply add or average these scores because they are not on the same “unit of measurement”. One of the core tasks of a fusion algorithm is to map these scores from different scales into a uniform, rank-based system.

  3. Ranking and Weighting: After deduplication and alignment, we need to decide how to produce the final ranking of candidate documents. This is like multiple judges scoring contestants – we need a scientific and fair scoring rule. Common fusion algorithms include:

    • Weighted Score Fusion: For each fused result, compute the weighted sum of its scores from each recall path. E.g., Final Score = 0.7 * normalized dense score + 0.3 * normalized sparse score. This method requires manual weight determination and relies on experience.
    • Rank-Based Fusion: These algorithms ignore the original scores and only care about the ranking position of each document in each recall path. Among them, the most famous is undoubtedly the RRF (Reciprocal Rank Fusion) algorithm, which we will cover in detail. It is highly robust, does not require parameter tuning, and is the first choice in engineering.

In summary, multi-path recall ensures the “breadth” and “recall” of retrieval, while result fusion ensures the “precision” and “relevance”. They complement each other to form a more powerful RAG retrieval optimization solution.

3. Overview of Common Multi-Path Recall Strategies

Before diving into code, it’s necessary to survey the most common recall strategies like a “weapons arsenal”, understanding their applicable scenarios, advantages, and disadvantages, so we can be confident and effective in practice.

Recall Strategy Principle in Brief Core Advantage Core Disadvantage Applicable Scenarios
Dense Vector Retrieval Map query and documents to high-dimensional semantic space and compute similarity. Understands semantics, synonyms, paraphrasing; strong generalization. Insensitive to keywords; requires high-quality Embedding model; may perform poorly on rare entities or small samples. General scenarios, open-domain QA, tasks requiring strong semantic understanding.
Sparse Vector Retrieval (BM25) Classic statistical model based on term frequency (TF) and inverse document frequency (IDF). Exact keyword matching; simple index construction, fast computation; good for proper nouns. Cannot understand semantics; does not match synonyms; very sensitive to query formulation. Proper noun retrieval, exact matching, effective supplement to dense retrieval.
HyDE Use LLM to first generate a “hypothetical document”, then use it for retrieval. Bridges semantic gap between short query and complex documents; improves zero-shot retrieval capability. Relies on LLM generation quality; introduces extra LLM call cost; may produce hallucinations. Queries with ambiguous intent, need for context understanding, out-of-domain retrieval.
Graph Retrieval (GraphRAG) Search paths and perform multi-hop reasoning on a knowledge graph. Can handle complex queries relying on entity relationships; can uncover deep, cross-document associations. Requires pre-built knowledge graph, expensive; less flexible than vector search for queries. Multi-step reasoning, recommendation, relationship analysis, QA involving complex entity relations.

Best Practice: For startup teams or those with limited resources, I strongly recommend starting with the “dense + sparse“ two-path combination. This is the most cost-effective pair, solving over 80% of retrieval issues. You can use frameworks like LangChain or LlamaIndex, which have built-in retrievers and fusers.

4. Core Algorithm for Result Fusion: RRF (Reciprocal Rank Fusion)

Among many fusion algorithms, RRF (Reciprocal Rank Fusion) stands out as the “jack-of-all-trades” in the industry due to its simplicity, efficiency, and training-free nature. It does not rely on scores from different retrievers, only on rankings, which naturally handles the incomparability of scores between different retrievers.

4.1 Principle and Formula of RRF

The RRF formula is elegant and intuitive:

$$Score_{RRF}(d) = \sum_{r \in R} \frac{1}{k + rank_r(d)}$$

Let’s break it down:

  • d: A candidate document being evaluated.
  • R: The set of all recall paths. E.g., R = {Dense, Sparse, HyDE}.
  • rank_r(d): The rank of document d in recall path r. Note that rank starts from 1. If document d is not retrieved by path r, we ignore this term.
  • k: A constant, typically set to 60. Its purpose is to prevent a document ranked extremely high in one path from getting too large a score. Imagine if k=0, then rank 1 gives 1/1 = 1, rank 2 gives 1/2 = 0.5 – too much difference. With k, rank 1 gives 1/(60+1) ≈ 0.016, rank 2 gives 1/(60+2) ≈ 0.016 – the gap is greatly reduced and smoothed. The larger k, the smaller the score difference between ranks, making fusion more “averaging”; the smaller k, the more influence high-ranked documents have. In engineering, k=60 is a well-proven robust starting point.

Example:
Assume two recall paths: Dense (D) and Sparse (S). They return the following (rank in parentheses):

  • D: doc1(1), doc2(2), doc3(3)
  • S: doc2(1), doc1(4), doc4(5)

Now compute the RRF score for each document (k=60):

  • doc1: Rank 1 in D => 1/(60+1) ≈ 0.0164; Rank 4 in S => 1/(60+4) ≈ 0.0156. Total: 0.0164 + 0.0156 = 0.0320.
  • doc2: Rank 2 in D => 1/(60+2) ≈ 0.0161; Rank 1 in S => 1/(60+1) ≈ 0.0164. Total: 0.0161 + 0.0164 = 0.0325.
  • doc3: Only in D, rank 3 => 1/(60+3) ≈ 0.0159. Total: 0.0159.
  • doc4: Only in S, rank 5 => 1/(60+5) ≈ 0.0154. Total: 0.0154.

Final ranking: doc2 (0.0325) > doc1 (0.0320) > doc3 (0.0159) > doc4 (0.0154).

From this example, doc1 ranked first in dense but only fourth in sparse. However, doc2 ranked well in both (2nd and 1st), so its total score surpasses doc1. RRF automatically gives higher weight to documents that are “approved” by multiple retrievers – this is more robust than any manually set weights.

4.2 Natural Association Between RRF and Deduplication

Another major advantage of RRF is its deduplication capability. Look at the RRF formula: it operates on document d. When we compute the score for doc2, we only compute it once, but we contribute the ranks from both recall lists. This essentially accomplishes the most important step of multi-path recall deduplication: merging by document ID. In our code implementation, we can use a dictionary with doc_id as the key. When processing each recall list, we update the RRF score for that doc_id. Eventually, each doc_id has exactly one total score, naturally achieving deduplication.

4.3 Weighted RRF: When Some Recall Paths Are More Trustworthy

Although standard RRF treats each recall path equally, sometimes we may believe that one path is more reliable than another. For example, after carefully tuning the data quality of your knowledge base, you might want to give more weight to dense retrieval.

In this case, you can extend the RRF formula by introducing a weight vector:

$$Score_{weightedRRF}(d) = \sum_{r \in R} \frac{w_r}{k + rank_r(d)}$$

where w_r is the weight for recall path r. For instance, set w_dense = 1.5, w_sparse = 1.0. Then documents with high ranks in dense retrieval have their score advantage amplified. This method is often called hybrid retrieval weight sorting. In actual tuning, weights are often found via grid search on a small validation set.

Note: Weighted RRF is more flexible but introduces extra hyperparameters (weights), increasing tuning difficulty. Without clear business preference or evaluation data, using standard, unweighted RRF is a safer and recommended default choice.

5. Hands-on Code: Implementing Multi-Path Recall and RRF Fusion

We now put theory into practice by implementing a complete RAG retrieval optimization example in Python, connecting all the concepts we learned.

5.1 Simulated Data

To focus on demonstrating the fusion logic, we simulate results from two recall paths: one representing dense retrieval, one representing sparse retrieval.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import json
from typing import List, Dict

# --- 1. Simulate multi-path recall data ---
# Assume each result is a dict containing: doc_id, score, text
sparse_results = [
{"doc_id": "doc1", "score": 0.85, "text": "高血压患者饮食建议"},
{"doc_id": "doc2", "score": 0.72, "text": "低钠饮食可降低血压"},
{"doc_id": "doc3", "score": 0.60, "text": "高血压注意事项"}
]
dense_results = [
{"doc_id": "doc2", "score": 0.91, "text": "低钠饮食可降低血压"},
{"doc_id": "doc4", "score": 0.80, "text": "高血压运动指南"},
{"doc_id": "doc1", "score": 0.75, "text": "高血压患者饮食建议"}
]

Code Explanation:

  • sparse_results represents the output of sparse retrieval (e.g., BM25), sorted by score descending.
  • dense_results represents the output of dense retrieval (e.g., vector similarity), sorted descending.
  • Note that doc1 and doc2 appear in both results – this is exactly what we need to handle.

5.2 Core RRF Fusion Function

Now, implement the core RRF fusion logic.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# --- 2. Implement RRF fusion function ---
def rrf_fuse(lists: List[List[Dict]], k: int = 60) -> List[Dict]:
"""
Fuse multiple retrieval results using Reciprocal Rank Fusion (RRF).

Args:
lists: A list containing result lists from multiple recall paths.
Each result list is a list of dicts, each dict must have 'doc_id' and 'text' fields.
k: The constant k in the RRF formula, default 60.

Returns:
A list of deduplicated documents sorted by RRF score descending.
"""
# Dictionary to store final results; key is doc_id, value is dict with doc_id, text, rrf_score
rank_map = {}

# Iterate over each recall path
for lst in lists:
# Key: sort the current recall path results by score descending and assign rank
# Rank starts from 1 – core input for the RRF formula
sorted_lst = sorted(lst, key=lambda x: x["score"], reverse=True)
for rank, item in enumerate(sorted_lst, start=1):
doc_id = item["doc_id"]
# If this doc_id appears for the first time, initialize its info
if doc_id not in rank_map:
rank_map[doc_id] = {"doc_id": doc_id, "text": item["text"], "rrf_score": 0}

# Accumulate the RRF contribution of this rank to the document's score
rank_map[doc_id]["rrf_score"] += 1.0 / (k + rank)

# Convert dictionary to list and sort by rrf_score descending
fused_results = sorted(rank_map.values(), key=lambda x: x["rrf_score"], reverse=True)
return fused_results

# Call RRF fusion
fused_results = rrf_fuse([sparse_results, dense_results])

Line-by-Line Explanation of Key Code:

  1. def rrf_fuse(lists, k=60): Function definition. The adjustable k parameter is key for tuning.
  2. rank_map = {}: Create an empty dictionary as the “master table” to store each document’s RRF score. This is the core data structure for implementing multi-path recall deduplication.
  3. for lst in lists: Outer loop, processing each recall path’s results.
  4. sorted_lst = sorted(lst, key=lambda x: x["score"], reverse=True): First importance. Before computing RRF, we must sort each recall path’s results by their original score. This sorting is the basis for computing rank. If the original lists are already sorted, this step can be omitted, but we always sort explicitly to be safe.
  5. for rank, item in enumerate(sorted_lst, start=1): Assign rank. enumerate with start=1 ensures rank starts at 1.
  6. if doc_id not in rank_map: Deduplication logic. If doc_id doesn’t exist, create a new entry. If it exists (e.g., doc1 appears in both sparse and dense), we don’t create a new entry but simply accumulate the rrf_score thereafter.
  7. rank_map[doc_id]["rrf_score"] += 1.0 / (k + rank): Accumulate the RRF score. This is the core computation of the formula.
  8. fused_results = sorted(rank_map.values(), ...): Finally, extract the dictionary’s values(), convert to a list, and sort descending by the accumulated rrf_score.

5.3 Output Results

1
2
3
4
# --- 3. Print and analyze fusion results ---
print("融合后排序结果(已去重):")
for item in fused_results:
print(f" doc_id: {item['doc_id']}, rrf_score: {item['rrf_score']:.4f}, text: {item['text']}")

Expected Output:

1
2
3
4
5
融合后排序结果(已去重):
doc_id: doc2, rrf_score: 0.0325, text: 低钠饮食可降低血压
doc_id: doc1, rrf_score: 0.0320, text: 高血压患者饮食建议
doc_id: doc3, rrf_score: 0.0159, text: 高血压注意事项
doc_id: doc4, rrf_score: 0.0154, text: 高血压运动指南

Analysis:

  1. Deduplication successful: doc1 and doc2 appear only once; their scores from both recall lists have been correctly fused.
  2. Ranking reasonable: doc2 has high ranks in both lists (1st and 2nd), thus the highest rrf_score. doc1 ranked first in one list but only fourth in the other, so its score is slightly lower than doc2. doc3 and doc4 each appear in only one list, hence lower scores.

Tip: When encountering a duplicate doc_id in rank_map, we extract the text field only from the first occurrence. In a real system, you must ensure that the text field of the “same document” from different recall paths is consistent. Use a unified document ID system and preprocessing steps to guarantee this, avoiding confusion in the model due to inconsistent text.

6. Advanced Tips: Tuning Sparse-Dense Vector Fusion Recall

You have mastered the core methods of multi-path recall and RRF fusion. Now let’s discuss how to make it even better and smarter. This part differentiates experienced developers from junior engineers.

6.1 Dynamic Weight Allocation: No Longer a “One-Size-Fits-All”

In the weighted RRF earlier, you might wonder: “Can I avoid manually setting weights and let the system automatically choose based on query type?” That is the idea of dynamic weights.

For example, for a research query “Evolution history of Transformer models”, you might want a higher weight for dense retrieval because it can capture abstract concepts like “evolution history”. However, for a tool-like query “Charging interface specifications of 2024 Xiaopeng G9”, you should significantly increase the weight of sparse retrieval (BM25) because keywords like “2024”, “Xiaopeng G9”, “charging interface specifications” are central.

Implementation Ideas:

  1. Query Classification: Use a lightweight classifier (rule-based or small model) to determine the query type. For example, check if the query contains words like year, model, specification, price, etc.
  2. Weight Assignment: Dynamically assign weights to different recall paths based on the classification result.
    1
    2
    3
    4
    5
    6
    # Pseudo-code example
    def dynamic_weights(query):
    if any(word in query for word in ["2024", "型号", "规格", "价格"]):
    return {"sparse": 0.8, "dense": 0.2}
    else:
    return {"sparse": 0.3, "dense": 0.7}
  3. Application: Use the dynamically obtained weights when computing weighted RRF.

6.2 Secondary Re-Ranking: Using Cross-Encoder for Final Check

Although RRF is good, it is essentially a rank-based “fast” fusion; it does not re-evaluate the exact relevance between the document and the query. Even the top-K results after fusion may still contain some less relevant documents.

Solution: Use a more accurate but computationally heavier cross-encoder to perform fine-grained re-ranking on the top-N (e.g., top 50) results from RRF fusion. Unlike bi-encoders (the models used in dense retrieval), a cross-encoder takes the query and document together as input and directly computes their relevance score. This score is more accurate, but slower, so it can only be applied to a small set of candidates.

Process:
User Query -> [Multi-Path Recall -> RRF Fusion] -> Obtain Top-50 Candidates -> [Cross-Encoder Re-Ranking] -> Output Top-10 Final Results

This is the classic two-stage paradigm for multi-path recall re-ranking. The first stage uses efficient methods (dense + sparse + RRF) to quickly filter candidates; the second stage uses a precise but expensive method (cross-encoder) for fine ranking.

Practical Library Recommendation: You can use cross-encoder models from the sentence-transformers library, such as cross-encoder/ms-marco-MiniLM-L-6-v2.

6.3 Tuning the k Value of RRF

Tuning the k value is the most common optimization for RRF. A simple grid search can find the optimal k.

1
2
3
4
5
6
7
8
9
10
11
12
13
# Test different k values on a validation set
best_k = None
best_score = -1

for k_value in [30, 60, 100, 200]: # Candidate k values
# ... In this part, use your RAG system to retrieve on the validation set ...
# Compute metrics like Recall or NDCG under k_value
# current_score = evaluate_with_k(your_system, k=k_value)
# if current_score > best_score:
# best_score = current_score
# best_k = k_value

print(f"最佳k值为: {best_k}")

Generally, k values in the range 30-100 perform well. Smaller k emphasizes high-ranked documents more; larger k is more “egalitarian”, emphasizing how many recall paths retrieved the document.

7. Pitfalls: Practical Traps in Deduplication, Duplicate Documents, and Computational Overhead

No matter how good the theory, various “surprises” can occur during implementation. Below are pitfalls I’ve encountered in real projects – hope you can avoid them.

7.1 Pitfall 1: Inconsistent Document IDs Causing Deduplication Failure

Problem: You built dense retrieval (using bge-large) and sparse retrieval (using Elasticsearch or meili). When producing results, the dense retrieval returns document IDs from the database’s auto-increment ID (e.g., 123), while the sparse retrieval returns IDs generated internally by Elasticsearch (e.g., a1b2c3d4). Even though they correspond to the same document in the knowledge base, they are considered different documents in rank_map, causing deduplication failure, and the same content is output twice.

Solution: Unify document IDs. When building the knowledge base index, whether for the dense vector store or the sparse retrieval store, use the same globally unique document ID (e.g., a hash of the original PDF filename plus chunk number).

7.2 Pitfall 2: Losing Parent-Child Relationship – Merged Chunks Cannot Form Complete Content

Problem: A certain document is split into multiple chunks. During retrieval, dense retrieval hits chunk A, sparse retrieval hits chunk B. In RRF fusion, they are treated as two independent, unrelated documents for deduplication and ranking. As a result, the large model may only receive chunk A or chunk B, lacking context and unable to give a complete, accurate answer.

Solution: In the fusion stage, besides doc_id, also retain the parent document ID. While deduplicating, also perform parent-child merging: if multiple chunks belong to the same parent document, you can concatenate them into a longer passage, or at least mark their original positions for the large model, so it knows they come from the same source. This can be achieved by adding a parent_doc_texts list in the rank_map entry.

7.3 Pitfall 3: Sorting Computational Bottleneck in Large-Scale Recall

Problem: When you have many recall paths and each returns a large number of candidates (e.g., top 2000), the CPU and memory consumption of sorting all documents with RRF can increase significantly, potentially crashing the service response time.

Solution: Pruning or two-stage recall. Don’t directly fuse the top 2000 of each recall path. First, take only the top-K (e.g., top 100) from each path, then fuse these results with RRF. If you worry about missing important documents that are overlooked by other paths, you can take the last 100 from the top 2000 results and fuse them with the top 100 results in a second round. This essentially trades a bit of precision for faster speed.

8. Summary and Extension: From Multi-Path Fusion to Intelligent Routing

Through this in-depth discussion, you have mastered the core secret weapon of RAG online retrieval optimization: multi-path recall and result fusion.

Let’s recap the key points:

  1. Core Idea: Use multiple strategies (dense, sparse, HyDE, graph, etc.) in parallel to compensate for the “lopsidedness” of any single strategy, significantly improving RAG knowledge base recall.
  2. Core Algorithm: RRF (Reciprocal Rank Fusion) is a training-free, highly robust “decentralized” fusion strategy that by fusing rankings instead of scores naturally solves the score incomparability problem and perfectly supports deduplication.
  3. Core Practice: From simulated data to complete code, we implemented RRF-based fusion and emphasized the critical roles of rank assignment and rank_map in deduplication.
  4. Core Tuning: Dynamic weight allocation, introduction of cross-encoder secondary re-ranking (multi-path recall re-ranking), and grid search for k are advanced techniques to push your fusion system to the next level.
  5. Core Traps: Unify document IDs, maintain parent-child relationships, and control computational overhead through pruning.

Summary

Through this article, I believe you have gained a deeper understanding of the “RAG multi-path recall strategy”. I recommend practicing with real projects. If you have questions, feel free to discuss!

End of translation.