1. Introduction: The Limitations of Traditional RAG and the Value Proposition of Agentic RAG

Do you still remember the excitement of deploying your first RAG system? You watched it retrieve relevant documents from the knowledge base and generate seemingly plausible answers. For a moment, you thought “Artificial Stupidity” was finally turning into “Artificial Intelligence.” That was until a user threw a slightly more complex question at it: “Help me compare the cost-effectiveness of these three products, considering after-sales service and performance. I live outside the 5th Ring Road in Beijing. Is delivery convenient?” Your RAG system was completely baffled. It either just grabbed the keyword “product” and returned a list of products, or it generated a seemingly complete but logically incoherent “Frankenstein’s monster” of an answer.

In that moment, you realized: Traditional RAG is like a librarian who can only memorize. You ask, “In which year did Qin Shi Huang unify the six kingdoms?”, and it quickly flips to page 38. But if you ask, “If Qin Shi Huang immediately ordered the construction of the Great Wall right after unifying the six kingdoms, how long could the imperial treasury sustain it?”, it just stares at you blankly, then copies a passage from two completely unrelated books.

This rigidity stems from the static pipeline architecture of traditional RAG. You design the retriever, write the Prompt template, set the Top-K value, and the entire process is fixed. It doesn’t know when to switch retrieval methods. It doesn’t know how to break down a large problem into smaller steps. It certainly doesn’t know what to do if it fails to find an answer on the first try. It’s like a robot on an assembly line; no matter what workpiece arrives, it uses the same robotic arm to perform the same operation.

When your user queries evolve from “Q2 financial report of the company” to “Should I buy this company’s stock? Consider its latest quarterly report, competitor dynamics, and recent industry policies,” the limitations of traditional RAG become starkly apparent. Information overload (returning dozens of documents for the user to find the answer) and logical disconnection (forcibly stitching together irrelevant fragments) become the norm.

The arrival of Agentic RAG completely changes the game. It’s not a simple patch for traditional RAG; it’s a paradigm shift – evolving from a “fixed retrieval-generation pipeline” to an “agent-driven adaptive knowledge workflow.” In Agentic RAG, we introduce a core agent. It no longer mechanically executes fixed instructions but acts like a “proactive intelligence analyst.” Upon receiving a task, it first assesses its complexity – should it search for information first, or answer directly? What tools does it need to call? Is it vector retrieval, web search, or executing a web scraping code? It also reflects on intermediate results: Is this result sufficient? Is the relevance high enough? If not, should it change the keywords and search again? Or should it break the question down into two sub-questions and search for them separately? This ability for dynamic tool calling and iterative optimization truly evolves an AI system from “rigid retrieval” into a “proactive, thinking knowledge assistant.”

In this article, I will take you from the underlying principles to practical implementation, deeply analyzing Agentic RAG. You will learn: The core architecture and design philosophy of Agentic RAG; how to implement an agentic RAG system with dynamic tool calling in code; how to enable the agent with self-correction and iterative optimization capabilities; and what pitfalls to avoid in real-world development. Whether you are a developer just starting with RAG or a veteran who has already deployed traditional RAG, this article will open a new door for you – enabling AI to truly learn “how to think,” not just “how to memorize.”

2. What is Agentic RAG? From Static Pipelines to an Agent-Driven Paradigm Shift

The Librarian vs. The Intelligence Analyst: A Vivid Analogy

Imagine you walk into a massive library. Traditional RAG is the most diligent librarian: you give him a book title, and he runs to the shelf, precisely finds the book by its classification number, flips to the chapter you specified, and copies the paragraph for you. If the book title is inaccurate, he can’t find it. If your question requires synthesizing information from multiple books, he’ll just carry several books to your table and say, “you figure it out.” He is efficient and precise, but only suitable for simple tasks like “finding known information.”

Agentic RAG, on the other hand, is a trained intelligence analyst. You tell him: “Help me analyze this client company’s competitors. Analyze their brand sentiment changes on e-commerce platforms over the last three months, see if there’s any negative public opinion, and also check our cooperation history with them over the past year.” He doesn’t just run off. Instead, he pauses, plans in his mind: “Hmm, this is a multi-step task that requires phased execution. Step one: check the ‘internal CRM knowledge base’ for cooperation records. Step two: call the ‘news retrieval tool’ to find industry news from the last three months. Step three: use the ‘sentiment analysis tool’ to scrape e-commerce reviews. Step four: synthesize all this information into a structured analysis report.”

If he finds the cooperation record is too brief in step one, he will proactively adjust his strategy for step two – for example, focusing on news related to the client’s core business. He not only “knows how to retrieve” but, more importantly, “knows how to plan the retrieval.”

Agentic RAG vs. Traditional RAG: Three Fundamental Differences

From an architectural perspective, there are at least three essential differences between Agentic RAG and Traditional RAG:

Difference 1: Strategic Dynamism

Traditional RAG’s strategy is static and predefined. Your retrieval pipeline is usually: User Query → Embedding → Vector Retrieval → Top-K Documents → Prompt Concatenation → LLM Generation. The whole process is like a train timetable; the train runs on a designated track and never deviates. Agentic RAG’s strategy, however, is dynamically generated. The agent can dynamically decide which strategy to use based on the complexity, intent, and even conversation history of the current query.

For example: When a user asks “What is RAG?”, the agent performs the most direct vector retrieval because it’s a simple question – a direct knowledge base lookup is sufficient. But when the user asks, “What scenarios are RAG and fine-tuning suitable for? What are their respective advantages and disadvantages? If I need to build a Q&A bot for a client, which one should I choose?”, the agent might choose a multi-strategy combination: first, use vector retrieval to find basic definitions of RAG and fine-tuning; then, call a web search to find the latest comparison articles; and finally, use a code tool to calculate the cost of both approaches (assuming cost data is in the knowledge base).

This is something Traditional RAG cannot do.

Difference 2: Tool Calling Capability

Traditional RAG’s toolbox only has one hammer: vector retrieval. No matter the type of question, it uses the same retrieval logic: embedding + cosine similarity. Agentic RAG, however, is equipped with a pluggable toolbox. Besides vector retrieval, it also has:

  • Structured Query Tools: For querying databases and knowledge graphs.
  • Web Search Tools: To fetch real-time web information when the internal knowledge base is insufficient.
  • Code Execution Tools: For when calculations or data statistics are needed, allowing the agent to generate and execute code.
  • Document Parsing Tools: For processing unstructured files like PDFs and Word documents.
  • API Call Tools: To interface with RESTful APIs of external systems.

These tools are not just for show; the agent actively “summons” the appropriate tool combination based on the current task’s needs.

Difference 3: Iterative Result Feedback Mechanism

Traditional RAG is a “one-shot” process: you input a query, it outputs an answer, and the process is done. Agentic RAG, however, has a built-in closed-loop feedback mechanism. The agent self-evaluates the results of each retrieval and generation:

  • Relevance Check: Is the retrieved document really relevant to the question? If only one keyword matches but it’s semantically unrelated, the agent realizes “this retrieval failed” and proactively adjusts the strategy.
  • Completeness Check: Is the retrieved information sufficient to fully answer the question? For instance, if information from three dimensions is needed but only two were retrieved, the agent automatically performs a third retrieval.
  • Consistency Check: Is the information from different tools contradictory? If there are contradictions, the agent requests a re-retrieval or tries other information sources.

This feedback loop is the core of Agentic RAG’s “iterative optimization.” It transforms the RAG system from “one-shot retrieval” to “multi-round exploratory retrieval,” much like how a human researcher constantly adjusts search keywords and switches databases when working on a complex study.

Deep Dive: The Design Philosophy of Agentic RAG

To put it in one sentence: Agentic RAG = Agent + Toolset + Feedback Loop. The underlying design philosophy is essentially an extension of the “plan-and-execute” paradigm in machine learning. The agent is no longer a passive “gear” waiting for instructions but a decision-maker with a “brain.” It can perceive the current state (What is the user’s query? What’s in the knowledge base? What has already been found?), formulate an action plan (What should be searched first? Which tool to use? Should I split the task into sub-tasks?), execute and observe the results, and then correct the next action based on feedback.

Best Practice: When designing an Agentic RAG system, don’t try to give the agent “omnipotent” capabilities from the start. Start with the simplest scenario – equip the agent with 2-3 core tools (vector retrieval + web search + code execution) and let it learn “how to choose” within this small toolbox. As business needs grow, gradually expand the toolbox and the decision logic.

This design philosophy allows RAG to truly leap from being a “tool” to an “assistant.” Your AI system is no longer a robot that just memorizes information; it’s an intelligent assistant that can proactively plan based on the task, choose the most suitable tool, and reflect on and correct its own actions. This is the core value of Agentic RAG – as an advanced agent-based solution for RAG.

3. Core Architecture Decomposition: Agent, Toolset, and Iteration Loop

Having understood the design philosophy, let’s look at the core architecture. A mature Agentic RAG system consists of three main components: the Decision Agent, the Pluggable Toolset, and the Feedback Loop. The collaboration between these three components determines the intelligence and reliability of the system.

Component 1: The Decision Agent – The System’s Brain

The decision agent is the “brain” of the Agentic RAG system, responsible for understanding and distributing tasks. It is usually built on top of an LLM, but it’s not just the LLM itself – it’s a combination of an LLM and a reasoning framework. Two mainstream reasoning frameworks exist today:

The ReAct pattern (Reasoning + Acting): The agent works in a cycle of “Think → Act → Observe → Think.” When a user asks, “Which courier company has the best delivery performance in the Yangtze River Delta?”, the agent first thinks, “This question requires checking the delivery data of different courier companies in the Yangtze River Delta region,” and then calls a relevant tool to retrieve it. After getting the results, it thinks again, “The data is incomplete; information about J&T Express is missing,” and thus calls the tool again for supplementary retrieval. This “thinking-while-doing” model is particularly suitable for complex problems requiring multiple rounds of exploration.

The Plan-and-Execute pattern: The agent first formulates a complete action plan and then executes step-by-step according to the plan. For the same question, the agent would first plan: “Step one: list all major courier companies. Step two: query their delivery times in the Yangtze River Delta one by one. Step three: compare the data and generate a report.” Then it executes the plan methodically. The advantage of this model is a clear task path, making it easy to track and debug. The downside is that it’s slightly less flexible than the ReAct model.

Tip: In practice, the ReAct pattern is better for open-ended, uncertain problems, while the Plan-and-Execute pattern is better for tasks with clear, predictable steps. I recommend beginners start with the ReAct pattern because its “thinking-while-doing” logic is closer to natural human thought and easier to debug. Once you are familiar with the agent’s behavior patterns, you can try the Plan-and-Execute pattern for tasks that clearly require “step-by-step completion.”

Component 2: The Pluggable Toolset – The Agent’s Hands

The toolset is the “hand” through which the agent interacts with the environment. Each tool is essentially an encapsulated function or API, defining its input, output, and functionality description. The key design point is: The registration information for each tool must include a clear description and parameter explanation. This is because the agent relies on these descriptions to decide “when to use which tool.”

Here is a typical snippet for registering a tool:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Register a vector search tool
tools = [
{
"name": "vector_search",
"description": "Perform semantic search in the internal knowledge base, returning the most relevant document fragments",
"parameters": {
"query": {"type": "string", "description": "The search query"},
"top_k": {"type": "integer", "default": 5}
}
},
{
"name": "web_search",
"description": "Fetch the latest information from the internet via a search engine",
"parameters": {
"query": {"type": "string", "description": "The search keyword"},
"num_results": {"type": "integer", "default": 5}
}
},
{
"name": "code_executor",
"description": "Execute Python code for data calculation, statistical analysis, etc.

The code must use print to output the result.",
"parameters": {
"code": {"type": "string", "description": "The Python code to be executed"}
}
}
]

Note: The quality of the tool description directly affects the accuracy of the agent’s decisions. A description that is too brief might prevent the agent from knowing when to use the tool; a description that is too long might cause “information overload.” A good practice is to explain in the description “what problem this tool is good at solving” and “what type of data it is suitable for processing.” For example, the description for vector_search could include: “Good at handling questions requiring semantic similarity understanding, such as explaining technical concepts or comparing products.”

Component 3: The Feedback Loop – The Self-Correction Engine

The feedback loop is the most revolutionary component of Agentic RAG compared to Traditional RAG. It endows the system with self-correction and iterative optimization capabilities. A standard feedback loop consists of three steps:

Step 1: Result Evaluation. After each tool call, the agent evaluates the results. Evaluation dimensions include: Relevance (is the result strongly related to the query?), Completeness (does the result cover all query dimensions?), Consistency (is the information returned by different tools consistent with each other?). This evaluation can be done with a simple Prompt, like: "Please analyze if the following search results completely answer the user's question. If information is missing, point out what is missing. If it’s irrelevant, return 'No relevant results.'"

Step 2: Strategy Adjustment. If the result evaluation is unsatisfactory, the agent adjusts its strategy. Possible adjustments include:

  • Query Rewriting: Use a HyDE (Hypothetical Document Embeddings) approach to expand or rewrite the original query, generating a more precise search statement.
  • Tool Switch: Try a different tool. For example, if a vector search yielded no results, try a web search.
  • Task Decomposition: Break a large problem down into several smaller ones, retrieve for each, and then synthesize.

Step 3: Termination Decision. After each iteration, the agent must decide if it can “terminate.” Termination conditions usually include: the confidence score of the retrieval results exceeds a threshold, the number of tool calls hits a limit (to prevent infinite loops), or the number of iteration rounds reaches a limit (to control cost and latency). If a termination condition is met, the agent proceeds to the answer generation phase; otherwise, it loops back to Step 1.

The Collaboration Flow of the Three Components

The collaboration between these three components forms the core workflow of Agentic RAG:

  1. Receive Query → The decision agent analyzes the user’s intent and task complexity.
  2. Formulate Plan → Based on the analysis, the agent decides which tools are needed and in what order to call them.
  3. Execute Tools → Call the corresponding tools to obtain intermediate results.
  4. Feedback Evaluation → The agent evaluates the quality of the intermediate results.
  5. Iterative Optimization → If unsatisfied, adjust the strategy (rewrite query, switch tool, decompose sub-problems) and return to step 3.
  6. Generate Answer → When iteration terminates, generate the final answer based on all the collected retrieval results.

This flow looks complex, but when you implement it in code, you’ll find the core logic is only a few dozen lines – the key is not the code complexity, but how to design the agent’s decision rules and how to define the tool interaction interfaces. Next, we’ll demonstrate this process with a hands-on code example.

4. Hands-on 1: Build Your First Agentic RAG System from Scratch (Dynamic Tool Calling Implementation)

Theory must eventually become practice. In this section, we will build a minimal viable Agentic RAG system from scratch. You will see with your own eyes how the agent dynamically selects and calls different tools based on a user query, and how basic iterative optimization is implemented.

Preparation: Environment and Dependencies

We will use Python standard library for the core logic, avoiding heavy frameworks like LangChain, so you can understand the underlying principles more clearly. The only external dependency is an API from OpenAI (or any compatible LLM API like Qwen, DeepSeek) to power the agent’s decision-making and answer generation.

1
2
3
4
5
6
# Install dependencies
# pip install openai

import json
import openai
from typing import Callable, List, Dict, Any

Implementing the Core: The Agent Decision Engine

The core decision logic of the agent is: Given a user query and a set of available tools, the LLM needs to output “which tool I should call and with what parameters.” This is typically done by providing the LLM with a structured Prompt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def call_llm(prompt: str, response_format: str = "text"):
"""Call LLM, supporting text and structured JSON returns"""
client = openai.OpenAI()
messages = [{"role": "system", "content": "You are an intelligent assistant. Please output according to the requirements.

"},
{"role": "user", "content": prompt}]
response = client.chat.completions.create(
model="gpt-4o", # Replaceable model as needed
messages=messages,
temperature=0.0 # Use low temperature for decision tasks for determinism
)
content = response.choices[0].message.content
if response_format == "json":
# Attempt to extract JSON from content
try:
# Simple processing: find content between the first { and last }
start = content.index("{")
end = content.rindex("}") + 1
return json.loads(content[start:end])
except:
return content
return content

def build_tool_descriptions(tools: List[Dict]) -> str:
"""Format the tool list into a description text understandable by the LLM"""
descriptions = []
for tool in tools:
params_desc = "\n".join(
f" - {name}: {info['description']} (type: {info['type']}, default: {info.get('default', 'required')})"
for name, info in tool["parameters"].items()
)
descriptions.append(f"Tool Name: {tool['name']}\nDescription: {tool['description']}\nParameters:\n{params_desc}")
return "\n\n".join(descriptions)

def agent_decision(query: str, tools: List[Dict], context: str = "") -> Dict:
"""Let the agent decide the next action"""
tool_desc = build_tool_descriptions(tools)
decision_prompt = f"""
You are an intelligent retrieval assistant. Current user query: {query}

Current information: {context if context else "None yet"}

You have the following tools available:
{tool_desc}

Based on the needs of the query, decide the next action. Output in JSON format:
{{
"action": "tool name to call or final_answer",
"params": {{"parameter_name": "parameter_value"}},
"reasoning": "Your reasoning for this decision"
}}

Rules:
- If you believe the current information is sufficient to answer, set action to "final_answer"
- Otherwise, choose the most suitable tool (only one), and fill in the correct parameters
- If a tool has already been called and the result was unsatisfactory, try a different tool
"""
result = call_llm(decision_prompt, response_format="json")
return result

Defining the Toolset: Simulating a Real Scenario

To demonstrate dynamic tool calling, we define three typical retrieval tools:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Simulate a knowledge base
knowledge_base = {
"Product A": "Product A is a CRM system for SMEs, focusing on customer management and sales process automation. Price: 199 CNY/month. Supports online payment and bank transfer.",
"Product B": "Product B is an ERP system for large enterprises, including Finance, HR, and Supply Chain modules.

Price: 899 CNY/month. Deployment: supports cloud and on-premise.",
"Product C": "Product C is a free CRM system providing basic customer management functions, suitable for startups. Advanced functions are paid: 99 CNY/month.",
}

# Tool 1: Internal Knowledge Base Retrieval
def vector_search(query: str, top_k: int = 2) -> str:
"""Knowledge base retrieval based on keyword matching (simulating semantic search)"""
# Simplified implementation: simulate relevance via keyword matching
results = []
for key, value in knowledge_base.items():
# Calculate keyword match count
matched_words = len(set(query.split()) & set(key.split() + value.split()))
if matched_words > 0:
results.append((value, matched_words))
# Sort by match count
results.sort(key=lambda x: x[1], reverse=True)
if not results:
return "❌ No relevant information found in the knowledge base.

"
return "\n\n".join([f"[Source: {knowledge_base[key]}]\nContent: {value}" for key, value in results[:top_k]])

# Tool 2: Web Search (simulated)
def web_search(query: str, num_results: int = 3) -> str:
"""Simulate web search, returning static test data"""
web_data = {
"Product Comparison": "According to user reviews, Product A has the most comprehensive CRM features, Product B has the best ERP, and Product C offers the best value for money.

",
"After-sales Service": "Product A provides 24/7 online customer service. Product B provides a dedicated account manager. Product C only offers community support.",
"Industry Report": "In the 2024 CRM market report, Product A leads in market share, while Product B performs prominently in the enterprise market.",
}
# Simple matching
for key, value in web_data.items():
if key in query or any(word in key for word in query.split()):
return value
return "❌ No relevant information found via web search.

"

# Tool 3: Code Execution (for calculations, statistics, etc.)
def code_executor(code: str) -> str:
"""Execute simple Python code (safe sandbox)"""
try:
# Note: A real production environment needs stricter sandboxing
local_vars = {}
exec(code, {"__builtins__": {}}, local_vars) # Extremely simplified, for demonstration only
return str(local_vars.get("result", "Code execution completed"))
except Exception as e:
return f"❌ Code execution failed: {str(e)}"

# Register tools
tools = [
{"name": "vector_search", "func": vector_search,
"description": "Retrieve product information, prices, features, etc. from the internal knowledge base",
"parameters": {"query": {"type": "string", "description": "Search keyword, e.g., product name"}}},
{"name": "web_search", "func": web_search,
"description": "Search the internet for the latest information, such as user reviews, industry reports",
"parameters": {"query": {"type": "string", "description": "Search keyword"}}},
{"name": "code_executor", "func": code_executor,
"description": "Execute Python code for data calculation or analysis.

Please assign the result to a variable named 'result' in the code.",
"parameters": {"code": {"type": "string", "description": "Python code to execute"}}}
]

The Main Loop: Implementing Dynamic Tool Calling

Now, let’s write the main loop for Agentic RAG. The agent will continuously decide → execute tools → evaluate results until it decides to generate an answer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def run_agentic_rag(query: str, max_iterations: int = 10):
"""
Agentic RAG Main Loop
Parameters:
query: User's query
max_iterations: Maximum iterations to prevent infinite loops
"""
context = "" # Accumulated context information
iteration_count = 0
tool_call_history = [] # Record tool call history to prevent useless repeated calls

while iteration_count < max_iterations:
iteration_count += 1
print(f"\n--- Iteration {iteration_count} ---")

# 1. Agent Decision
decision = agent_decision(query, tools, context)
print(f"🤔 Agent Decision: {decision['reasoning']}")
action = decision["action"]

# 2. Check if termination is requested
if action == "final_answer":
print("✅ Agent believes information is sufficient, preparing to generate answer")
break

# 3. Execute Tool Call
# Find the corresponding tool
tool = None
for t in tools:
if t["name"] == action:
tool = t
break

if tool is None:
print(f"⚠️ Tool {action} does not exist, skipping")
context += f"\n[Note] Calling {action} failed, tool does not exist.

"
continue

# Get parameters
params = decision.get("params", {})

# Prevent repeated calls to the same tool with the same parameters
call_signature = f"{action}_{json.dumps(params, sort_keys=True)}"
if call_signature in tool_call_history:
print(f"⚠️ Detected repeated call to {action}, switching strategy")
context += f"\n[Note] {action} has been called with the same parameters, information is insufficient. Please try a different approach.

"
continue
tool_call_history.append(call_signature)

# Execute tool
print(f"🔧 Calling Tool: {action}({params})")
try:
result = tool["func"](**params)
print(f"📥 Tool Returned: {result[:100]}...") # Print only first 100 chars
except Exception as e:
result = f"❌ Tool call error: {str(e)}"
print(result)

# 4. Update Context
context += f"\n[Tool Call: {action}]\nResult: {result}"

# 5. Generate Final Answer
final_prompt = f"""
User Query: {query}

Based on the following retrieved information, generate a complete and accurate answer. If information is insufficient, clearly state what is missing.

Retrieved Information:
{context}
"""

final_answer = call_llm(final_prompt, response_format="text")
print("\n=== Final Answer ===")
print(final_answer)
return final_answer

# Test Run
if __name__ == "__main__":
user_query = "Which is more suitable for a startup, Product A or Product C? Consider price and features."
run_agentic_rag(user_query)

Analysis of the Run Results

When you run the above code, you’ll see the agent making decisions step-by-step:

  1. First Iteration: The agent realizes it needs to compare two products. It first calls vector_search to retrieve information on Product A and Product C separately.
  2. Second Iteration: After getting price and basic feature information, the agent finds it lacks an evaluation for the “startup” scenario. So, it calls web_search to find relevant reviews.
  3. Third Iteration: Synthesizing all the information, the agent believes it is sufficient and calls final_answer to generate a comparative analysis report.

Common Pitfall Tip: During implementation, the most common problem is the tool call falling into an infinite loop. For example, the agent repeatedly calls the same tool but keeps getting empty results due to incorrect parameters. Solutions include: ① Setting a maximum iteration limit (max_iterations in the code); ② Recording the tool call history to prevent repeated calls to the same tool with the same parameters; ③ When a tool returns empty results, instructing the agent to “try another tool or a different keyword.”

This minimal prototype, though simple, already demonstrates the core capability of Agentic RAG – dynamic tool calling. You can build upon this template, replacing the simulated knowledge base and tools with real ones, to quickly set up your own agentic retrieval system.

5. Hands-on 2: Iterative Optimization Strategy – Teaching the Agent to Self-Correct

In the previous section, our agent just “chooses a tool, calls it, and then finishes.” But a real Agentic RAG system needs a crucial capability: When the initial retrieval results are unsatisfactory, the agent can self-correct. In this section, we will implement this core “iterative optimization” mechanism.

Problem Scenario: A Classic Multi-Information Query

Let’s assume a user asks: “Help me compare the cost-effectiveness of Product A, B, and C, considering the three dimensions: price, features, and after-sales service.” This is a classic multi-dimensional comprehensive query. A fixed Traditional RAG retrieval would likely only hit documents related to the price dimension, generating a one-sided answer. Iterative optimization in Agentic RAG ensures all dimensions are covered.

Implementing Iterative Optimization: Query Rewriting and Strategy Switching

The core logic of iterative optimization is: The agent evaluates each retrieval result and, if it finds information missing or incomplete, adjusts the strategy and continues searching. We implement this by adding an “Information Completeness Evaluator.”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def evaluate_information_gap(query: str, context: str) -> List[str]:
"""
Evaluate which dimensions of the current information need supplementation.
Returns a list describing the missing dimensions that need retrieval.
"""
eval_prompt = f"""
User Query: {query}

Currently Retrieved Information:
{context}

Analyze: Does the current information completely cover all the required dimensions of the user's query?
If complete, output: []
If there are gaps, output a list of the missing dimensions, each described in one sentence.
For example: ["Missing after-sales service information for Product C", "Missing price comparison for the three products"]

Note: Only output the JSON array, do not output anything else.
"""
result = call_llm(eval_prompt, response_format="json")
if isinstance(result, list):
return result
return []

def generate_missing_query(gap: str) -> str:
"""Generate a new retrieval query based on the missing dimension"""
gen_prompt = f"""
Based on the needs for comparing the cost-effectiveness of "Product A, B, and C", we now need to supplement the following information: {gap}
Generate a concise retrieval query to search for relevant information in the knowledge base.

Only return the query string, nothing else.
"""
return call_llm(gen_prompt, response_format="text")

The Complete Iterative Optimization Agentic RAG System

Now, we integrate the evaluator into the main loop, implementing the “Retrieve → Evaluate → Missing → Re-retrieve” iteration chain.

def run_iterative_agentic_rag(query: str, max_iterations: int = 10):
    """
    Agentic RAG with Iterative Optimization
    """
    context = ""
    iteration_count = 0
    
    # Initialization: The agent first obtains an initial retrieval result
    initial_decision = agent_decision(query, tools, "")
    if initial_decision["action"] == "vector_search":
        # Perform an initial retrieval
        result = vector_search(query, top_k=3)
        context = f"[Initial Retrieval]\n{result}"
        print(f"📥 Initial retrieval completed: Obtained {len(result.split(chr(10)))} information fragments.")
    
    while iteration_count < max_iterations:
        iteration_count += 1
        print(f"\n--- Evaluation and Optimization Round {iteration_count} ---")
        
        # 1. Evaluate Information Completeness
        gaps = evaluate_information_gap(query, context)
        
        if not gaps:
            print("✅ Information is complete, no need for further retrieval")
            break
        
        # 2. Generate new queries for missing information
        for gap in gaps:
            print(f"🔍 Missing information found: {gap}")
            new_query = generate_missing_query(gap)
            print(f"📝 Generated new query: {new_query}")
            
            # 3. Use a suitable tool to supplement retrieval
            # Simple handling: prioritize vector_search, fallback to web_search if not found
            result = vector_search(new_query, top_k=2)
            if "not found" in result.lower(): # Adjust this check based on actual return message
                # Internal knowledge base has no result, switch to web search
                print("⚠️ No relevant result in internal knowledge base, switching to web search")
                result = web_search(new_query)
            
            context += f"\n[Supplementary Retrieval for: {gap}]\n{result}"
            
            # Check if iteration limit is reached
            if iteration_count >= max_iterations:
                break
    
    # 4. Generate answer based on the complete context
    final_prompt = f"""
User Query: {query}

After multiple rounds of retrieval and refinement, here is all the collected information:

{context}

Please generate a comprehensive response, ensuring coverage of the three dimensions: price, features, and after-sales service. Provide a conclusion on cost-effectiveness comparison.
If certain dimensions are still missing information,