1. Introduction

As LLM-driven agents are gradually deployed in enterprise production environments, a common issue has emerged: given the same underlying model, why are some agents effective, controllable, and maintainable, while others produce scattered outputs, fail to complete tasks, or even repeatedly call the wrong tools? The core difference often lies not in the model itself, but in two dimensions of the agent: role persona design and task logic orchestration. The role persona determines the agent’s behavioral boundaries and communication style, while task logic determines how it decomposes goals, executes in sequence, and handles exceptions.

This article explores three core aspects: role instruction design, task decomposition and planning, and tool call optimization, combining practical experience from mainstream frameworks such as OpenHands, CrewAI, and LangChain to explain how to systematically design a production-ready agent. After reading this article, you will master: agent role persona design methods, AI agent role instruction design techniques, task logic decomposition and sub-task planning, practical agent task decomposition cases, OpenHands AgentController action routing, agent tool call and description optimization, CrewAI multi-role collaborative task design, and key points for applying LangChain agent planning and memory mechanisms.

2. Core Methods for Agent Role Persona Design

An agent’s role persona cannot be defined simply by saying “you are an assistant.” In engineering practice, role instructions need to cover four dimensions: identity positioning, capability boundaries, behavioral norms, and output format.

2.1 Identity Positioning

Identity positioning answers “who the agent is.” This is not just a role name; it should also provide contextual background related to that role. For example:

  • “You are a customer support specialist for an e-commerce platform, employee ID CS-0214”
  • “You are a senior data analyst specializing in time series forecasting and anomaly detection”

It is recommended to include the affiliated organization and professional domain in the identity positioning, which helps the LLM more accurately lock in the knowledge scope in the semantic space. Avoid overly broad role descriptions (such as “you are a helpful assistant”), as such descriptions have very weak constraints on model behavior.

2.2 Capability Boundaries

Capability boundaries are the most underestimated part of role persona design. They need to clearly tell the agent “what you can do” and “what you cannot do.” “What you cannot do” is sometimes more important than “what you can do,” because LLMs naturally tend to assume they can do anything, which leads agents to promise results beyond their capabilities or attempt to use non-existent methods.

Recommended format for capability list:

1
2
Your currently available tools include: query order status, query logistics information, submit refund requests.
You are not authorized to: modify order prices, delete user accounts, access sensitive information such as user passwords.

2.3 Behavioral Norms

Behavioral norms cover communication style, decision constraints, and exception handling strategies. For example:

  • Communication style: Use polite language, limit each reply to three sentences
  • Decision constraints: When uncertain, must confirm with the user, do not guess
  • Exception handling: When an API call fails, retry once; if retry still fails, inform the user and provide an alternative

In the OpenHands framework, this part is injected into the system prompt through a fixed “constraints zone” and is repeatedly sent into the context at each conversation turn to prevent the model from “forgetting” the rules in long conversations.

2.4 Common Pitfalls

A common mistake in practice is writing role instructions in an overly literary style. For example, “You are a warm, considerate, and empathetic customer service assistant”—such descriptions have far less impact on model behavior than specific constraints (e.g., “When the user expresses dissatisfaction, apologize first and then provide a solution”). Another mistake is omitting output format constraints. Without specifying the output format, the model may return plain text, Markdown, JSON, or a mix, causing downstream parsing logic to become unstable.

3. Principles of Task Logic Decomposition and Sub-Task Planning

Taking “query this week’s weather and add it to my calendar” as an example, a poorly designed agent might directly call a non-existent “weather + calendar” comprehensive API, or first query the weather and then not know how to proceed. A well-designed agent, on the other hand, uses planning capability to break the task into an executable chain of sub-tasks.

3.1 Trigger Mechanism for Task Decomposition

Task decomposition is usually triggered by the agent’s “brain” (LLM) after receiving user input. The flow is as follows:

  1. Perform coarse-grained classification of the user’s intent to determine whether multi-step operations are needed
  2. If it is a single operation (e.g., “what time is it now”), directly call the corresponding tool and return
  3. If it is a composite task (e.g., “book a meeting room for tomorrow at 10 AM and notify the participants”), decompose it into a list of sub-tasks

In LangChain agents, this is implemented through the ReAct loop: the model outputs a “Thought” to plan the next step, then invokes an Action, observes the result (Observation), and decides the next step. This loop inherently contains the capability for task decomposition.

3.2 Sub-Task Dependencies and Order Design

Sub-tasks often have dependencies. Some sub-tasks must be executed sequentially (e.g., first check the weather, then decide whether to add the event), while others can be executed in parallel (e.g., querying different data sources simultaneously). In engineering implementation, there are two common patterns:

  • Sequential Planning: The agent generates the entire list of sub-tasks at once and then executes them one by one. The advantage is transparent logic and easy traceability; the disadvantage is the inability to handle dynamic changes during execution (e.g., API returns unexpected format).
  • Incremental Planning: The agent only plans the next step when the current step ends. This is more flexible but increases the complexity of maintaining context.

Dynamic adjustment capability is the key advantage of agents over fixed workflow engines. When a sub-task fails (e.g., weather API timeout), the agent can re-plan based on error information—for example, switching to an alternative weather source, or reporting to the user that weather data is unavailable and asking whether to continue adding the calendar event.

3.3 Impact of Memory Mechanisms on Task Planning

Task planning relies on short-term memory and long-term memory. Short-term memory stores tool call results and intermediate conclusions from the current session, ensuring data can be passed between sub-tasks. Long-term memory saves user preferences and historical behavior (e.g., “the user lives in Beijing”), allowing the agent to automatically adjust priorities when planning tasks.

In CrewAI, short-term memory is implemented via the context window, while long-term memory is persisted through a vector database. A typical engineering problem: when short-term memory overflows, the agent loses the intermediate results of previous sub-tasks, causing subsequent steps to fail. To address this, it is recommended in practice to use summary compression (compressing intermediate results into summaries and retaining them) or periodically refresh the context window, keeping only the key information that affects subsequent sub-tasks.

4. Practical Strategies for Tool Call and Description Optimization

Whether an agent can correctly call tools largely depends on the clarity of tool descriptions. This is not just an API documentation issue; it is the “contract” between the agent and external systems.

4.1 Hierarchical Structure of Tool Descriptions

Tool descriptions should include at least four levels:

  • Tool Function Overview: One sentence explaining what the tool does
  • Parameter Description: Type, value range, required/optional, example values for each parameter
  • Return Value Format: The return structure on success, and the error code/message format on failure
  • Usage Constraints: Call frequency limits, dependencies, etc.

Using a weather tool as an example, a poor description: “This tool returns weather data.” A better description:

1
2
3
4
5
6
7
8
9
10
11
12
13
Tool Name: get_weather
Function: Returns weather forecast based on city name and date, including temperature, humidity, and weather conditions (sunny/cloudy/rainy, etc.).
Parameters:
- city (string, required): City name, full Chinese name, e.g., "Beijing"
- date (string, optional): Date, format YYYY-MM-DD, defaults to current day if not provided
Return Value (success):
- temperature (number): Celsius
- condition (string): Weather condition description
- humidity (number): Humidity percentage
Return Value (failure):
- error_code (number)
- message (string): Error description
Constraints: Supports querying up to 7 days of future data; limited to 60 calls per minute.

4.2 Differences in Tool Calls Between Lightweight and Enterprise-Grade Agents

Lightweight agents (e.g., LangChain + a few APIs) typically adopt a “single call strategy”: the agent selects a tool based on user input, sends the request, and waits for the result. Enterprise-grade agents (e.g., OpenHands) add more control logic at the tool call layer:

  • Pre-call validation: Check parameter completeness; if conditions are not met, request supplementary information from the agent/user
  • Call retry: Automatically retry on transient failures and record failure counts
  • Call result caching: For idempotent tools, avoid duplicate calls within the same session
  • Call throttling: Prevent the agent from repeatedly calling the same tool in a loop (e.g., due to LLM hallucination causing infinite polling)

4.3 Impact of Tool Description Optimization on LLM Selection Accuracy

In an internal test, after upgrading descriptions for five tools from a single line to the above structure, the LLM’s accuracy in selecting the correct tool on the first attempt increased from 62% to 91%. The significant improvement came from:

  • Clearly distinguishing the functional boundaries of similar tools (e.g., avoid confusing “add user” and “update user” descriptions)
  • Providing parameter constraints to avoid invalid calls (e.g., “date must be a future date”)
  • Including in the function overview what the tool cannot do (e.g., “This tool does not support querying historical weather”)

5. Hands-On: OpenHands AgentController Action Routing Design

The AgentController in the OpenHands framework is a typical centralized task routing component. Its core logic is to distribute tasks to corresponding handling methods based on the type of input action, thereby decoupling the handling logic for different states.

5.1 Action Types and Routing Strategy

AgentController defines four core action types:

  • Status Change: Triggered when the agent’s internal state changes (e.g., task status changes from “pending” to “executing”)
  • Message: Receives communication content from users or other agents
  • Delegation Start: Assigns a sub-task to another sub-agent for execution
  • Task Complete/Reject: Marks the terminal state of the current task (successfully completed or unable to process)

5.2 Example Routing Implementation

Below is a simplified AgentController implementation showing the action distribution logic:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class AgentController:
def __init__(self):
self.state_machine = TaskStateMachine()
self.sub_agents = {}

def handle_action(self, action):
action_type = action["type"]
if action_type == "status_change":
self._handle_status_change(action)
elif action_type == "message":
self._handle_message(action)
elif action_type == "delegation":
self._handle_delegation(action)
elif action_type in ["complete", "reject"]:
self._handle_final_state(action)
else:
self.log.warning("Unknown action type: %s", action_type)

def _handle_status_change(self, action):
# Update the task state machine, record current progress
self.state_machine.update(action["task_id"], action["new_status"])
# If the state becomes "executing", check if a new sub-task needs to be started
if action["new_status"] == "executing":
self._try_trigger_next_step(action)

def _handle_delegation(self, action):
# Create a sub-agent instance and assign a task
sub_agent = self._create_sub_agent(action["role"])
self.sub_agents[action["task_id"]] = sub_agent
sub_agent.start(action["sub_task"])

The advantage of this design is that each handler’s logic is independent, making it easy to test and debug. When problems occur, the action type, timestamp, and corresponding handler can be used to quickly locate the failure point. For example, if a sub-task never starts, the check point would be whether the “delegation start” action was correctly triggered, and whether the logic for creating a sub-agent in _handle_delegation was executed successfully.

5.3 Observability of Task Flow

AgentController outputs formatted logs on every routing action, including timestamp, action type, task ID, and upstream caller. This is crucial for troubleshooting in production environments. It is recommended to integrate such logs with distributed tracing systems to reconstruct the full call chain when multiple agents collaborate.

6. Hands-On: CrewAI Multi-Role Collaborative Task Design

CrewAI’s design philosophy is “multi-role agent collaboration to complete tasks.” Compared to single-agent design, multi-role scenarios impose higher requirements on role persona and task logic—not only must each agent be designed independently, but the interaction protocol between them must also be defined.

6.1 Role Definition and Behavioral Constraint Refinement

Suppose we need to build a “market research and report generation” workflow. We would define two agents:

Researcher Agent:

  • Role: Industry researcher, focused on data collection and structured aggregation
  • Capability boundaries: Can call search APIs, database queries, and data extraction tools; prohibited from interpreting data or adding conclusions (this is the report writer’s job)
  • Output specification: Only return structured data (JSON format), no natural language conclusions

Report Writer Agent:

  • Role: Analyst, focused on transforming data into a readable report
  • Capability boundaries: Prohibited from calling external data sources; can only receive output from the researcher agent as input
  • Output specification: Return a complete report in Markdown format, including titles, chart explanations, and conclusions

6.2 Task Dependency Chain Configuration

In CrewAI, task dependencies are configured via the context property of Task objects. Below is a configuration example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from crewai import Agent, Task, Crew

researcher = Agent(
role="Industry Researcher",
goal="Collect and structure market data",
tools=[search_api, db_query],
verbose=True
)

analyst = Agent(
role="Report Analyst",
goal="Generate analysis report based on raw data",
tools=[],
verbose=True
)

collect_data_task = Task(
description="Query e-commerce market sales data for Q3 2024, grouped by category and region",
expected_output="Structured data in JSON format",
agent=researcher
)

write_report_task = Task(
description="Based on the collected data, write a 3-page market analysis report",
expected_output="Complete report in Markdown format",
agent=analyst,
context=[collect_data_task] # Declares this task depends on the output of the previous task
)

crew = Crew(
agents=[researcher, analyst],
tasks=[collect_data_task, write_report_task],
verbose=True
)

The key point is the context parameter, which tells the CrewAI execution engine that the write_report_task can only be started after the collect_data_task is completed. Meanwhile, the researcher agent’s output is automatically injected into the analyst agent’s task context.

6.3 Common Pitfalls in Multi-Role Collaboration

A common problem is overlapping role responsibilities leading to redundant work. For example, if both the researcher and analyst enable the same data query tool, the analyst might bypass the researcher and directly call the data source, causing context fragmentation. The solution is to clearly assign tool ownership in the role persona—the researcher agent exclusively owns the data query tool, and the analyst agent is prohibited from adding such tools.

Another issue is breakage in the task dependency chain. If the researcher agent times out or returns invalid data, the analyst agent will be stuck in a loop waiting for input. In practice, it is recommended to inject a “fallback path when the task cannot be completed” into each agent—for example, if the researcher agent returns no result within 30 seconds, it returns a preset “data unavailable” placeholder to prevent the workflow from being blocked.

7. Advanced Tips and Lessons Learned

7.1 Controlling Hallucination During LLM Planning

When planning sub-tasks, LLMs tend to “over-decompose”—breaking simple tasks into dozens of micro-steps, many of which are logically unnecessary. This leads to low execution efficiency and increased LLM call costs.

Solutions:

  • Fix planning depth constraints in the system prompt, e.g., “decompose into at most 3 sub-tasks”
  • Set a maximum iteration limit (e.g., 5 times); force-terminate the loop when the limit is exceeded or a timeout occurs
  • Add a “task pruning” step downstream: a lightweight rule engine evaluates the sub-task list and removes redundant steps

7.2 Short-Term Memory Overflow and Context Loss

In long conversations, agents easily lose intermediate results from earlier steps, especially when tool calls return large amounts of data. A typical symptom: the agent retrieved data, but when referencing it later says “no relevant information found.”

Solutions:

  • Use a summary compression strategy: after each step, let the LLM summarize the key information in the current context into 1-2 sentences, replacing the original long text
  • Store key results in a vector database; retrieve via semantic queries when the agent needs them
  • Set a length limit for tool return results; automatically truncate or save to the file system when exceeding the limit

7.3 Tool Description Misinterpreted by the LLM

Even with well-written tool descriptions, the LLM may still select the wrong tool. A typical example: the agent needs to “update a user’s phone number” but incorrectly calls the “add user” API, because both tool descriptions contain the keywords “user” and “phone number.”

Solutions:

  • Add usage example fields at the parameter level to simulate a successful call example
  • For each tool, add negative examples of “not applicable scenarios,” e.g., “This is not the method for updating user information”
  • For high-risk operations (involving modification/deletion), add a “double confirmation” constraint: the agent must first show the user the operation to be performed, obtain confirmation, and then call

7.4 Maintaining Role Persona Consistency

In long conversations, agents may “drift from persona”—using a tone inconsistent with the role or performing operations outside its authority. This happens because system prompts are diluted by subsequent conversation content.

Solutions:

  • Re-inject core role instructions (especially constraints) in the prompt at the beginning of each turn
  • Isolate the role persona as a separate “system block” that cannot be overwritten, and prepend it before each LLM call
  • Periodically conduct “persona audits”: have another agent evaluate whether the current agent’s behavior conforms to the preset persona

7.5 The Golden Rule of State Tracking

State tracking is the most overlooked aspect of task logic design. A reliable practice is that every task endpoint must have a clear state. For example:

  • Sub-task executing → write result to state machine
  • Tool call success/failure → update task state
  • Sub-task complete ��� notify parent task

The state machine design should remain flat: avoid nesting beyond three levels, as it becomes difficult to locate issues during debugging. It is recommended to keep the number of states between 5 and 8, such as: pending, executing, completed, failed, timed out, waiting for input.

8. Summary and Future Directions

8.1 Core Recap

This article systematically explained the key methods of agent design from three dimensions: role persona, task logic, and tool calls. The role persona determines the agent’s behavioral boundaries, task logic determines its execution path, and tool descriptions determine whether it can correctly interact with external systems. The three support each other, and none can be omitted.

Key points can be summarized as follows:

  • Role Design: Identity positioning + capability boundaries + behavioral norms + output format, all indispensable; avoid literary descriptions, use specific constraints instead
  • Task Decomposition: Distinguish between sequential planning and incremental planning; use short-term/long-term memory to maintain data dependencies between sub-tasks; set a planning depth limit to prevent over-decomposition
  • Tool Calls: Descriptions should include at least function overview, parameter description, return value format, and usage constraints; add pre-call validation and retry mechanisms in enterprise-grade agents
  • Framework Practice: OpenHands AgentController decouples handling logic via action type routing; CrewAI implements multi-role collaboration through task dependency chains
  • Avoid Pitfalls: Watch for planning hallucinations, memory overflow, tool misinterpretation, persona drift; every task endpoint must have a clear state

8.2 Future Directions

Current agent design methods are still evolving rapidly. The following directions are worth continued attention:

Lightweight vs. Enterprise-Grade Selection: If the team needs to quickly validate ideas, a lightweight solution using LangChain + a single agent can be deployed fast; if multi-round complex tasks, multi-agent collaboration, and strict audit trails are needed, enterprise-grade frameworks like OpenHands and CrewAI are more suitable. When selecting, focus on evaluating whether the engineering complexity pays off in terms of the controllability and maintainability required by the business.

Agent Observability: As the online call volume of agents grows, logging and distributed tracing systems need to be upgraded. Traditional request-response logs are insufficient to cover the multi-step planning, tool calls, and dynamic adjustments within an agent. It is recommended to add metrics at the agent framework layer: average number of planning steps, tool call hit rate, number of retries, task completion rate, and output a complete “thought-action-observation” retrospective log on anomalies.

Impact of LLM Capability Evolution on Design Patterns: As models’ instruction-following ability improves (e.g., GPT-4o, Claude 3.5), negative constraint examples like “what you cannot do” in the role persona will become more effective. In the future, it may be possible to reduce reliance on engineering foolproof designs, allowing models to follow rules more precisely. However, the planning capability of models will also become stronger, potentially leading to more complex and unpredictable execution paths—this imposes higher demands on the agent’s boundary defense and exception handling capabilities.

Subsequent Recommendations: In the project initiation phase, spend a week designing and iterating the agent’s role persona and task logic before rushing to integrate tools and APIs. If role and logic design are not solid, every subsequent model upgrade or tool change will require significant rework. A reusable persona template and a robust task state machine are the foundation for long-term agent usability.

Conclusion

Through this article, you should now have a deeper understanding of “agent role persona design methods.” It is recommended to practice more in combination with actual projects. If you have any questions, feel free to discuss!