0. Series Loop (Readalong Without Open Source)

End-to-End Pipeline: Vue frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (Parsing → Analysis → Matching → Report) → tools/pdf_exporter PDF.
This Article: 8/17 · LLM Layer · Unified Invocation

Phase User Visible Code Entry Article
Create Session Welcome message POST /api/sessions 09
Multi-turn Dialogue SSE streaming chat/stream → run_guide_single_turn 06, 14
Info Sufficient Start Analysis _run_analysis_background 05, 07
Resume Parsing Progress 30% run_resume_parser 12
Profile/RIASEC Progress 50% run_profile_analyzer 03, 13
Career Matching Progress 70% run_career_matcher 02
Report Progress 90% run_reporter 11
Download PDF File GET …/report/pdf 11, 15
Description
Before Reading This Part 07 fault tolerance background
After Reading This Use the call table to decide whether to use chat or light model
Next Loop Part 12: JSON parsing degradation (Part 9)

Full series loop index: SERIES-LOOP.md

1. What Problem to Solve

iCan has 5 Agent nodes + chat API. If each file does its own ChatOpenAI(...):

  • Changing models (OpenAI → DeepSeek → Ollama) requires edits in multiple places;
  • When JSON output fails, each Agent handles try/except differently, inconsistent behavior;
  • Qwen3 on Ollama may output `` blocks, polluting dialogue and JSON parsing.

Therefore, we consolidate model creation, timeout, JSON degradation, Qwen3 handling into llm/providers.py. Agents only care about messages and return values.


2. Implementation Location and Configuration Source

Module Responsibility
config.py Settings.LLM_* defaults & environment variables
llm/providers.py get_chat_model / get_light_model / invoke_llm / invoke_llm_with_json / check_ollama_available
llm/parsers.py parse_json_from_text (JSON degradation final step)

Defaults (in code, not .env): LLM_MODEL_CHAT=gpt-4o, LLM_MODEL_LIGHT=gpt-4o-mini, LLM_BASE_URL=https://api.openai.com/v1.
Common overrides for local or Docker:

1
2
3
LLM_BASE_URL=https://api.deepseek.com/v1
LLM_MODEL_CHAT=deepseek-v4-flash
LLM_MODEL_LIGHT=deepseek-v4-flash

LLM Unified Invocation Layer


3. Dual Models: Who Uses Chat, Who Uses Light

The comment in providers.py says light is used for “resume parsing, report formatting,” but refer to the grep of call sites:

Caller Model Factory Description
agents/guide.py (6 places) get_chat_model() Multi-turn guidance needs stable tone
agents/resume_parser.py get_light_model() Structured JSON extraction
agents/profile_analyzer.py (6 nodes) get_chat_model() RIASEC + multi-dimension analysis
agents/career_matcher.py (3 nodes) get_chat_model() Path recommendation + JSON
agents/reporter.py get_chat_model() Section generation uses chat; file imports light but current section nodes don’t use it
api/routes/chat.py get_chat_model() Single-turn completion/streaming

Conclusion: Only resume_parser consistently uses light; don’t write “Reporter uses mini” in docs unless code changes.

Factory implementation (excerpt llm/providers.py):

1
2
3
4
5
6
7
8
9
def get_chat_model() -> ChatOpenAI:
return ChatOpenAI(
api_key=settings.LLM_API_KEY,
base_url=settings.LLM_BASE_URL,
model=settings.LLM_MODEL_CHAT,
temperature=settings.LLM_TEMPERATURE,
max_tokens=settings.LLM_MAX_TOKENS,
request_timeout=90,
)

get_light_model() differs only in LLM_MODEL_LIGHT, otherwise identical.


4. invoke_llm: Timeout and Qwen3

1
2
3
4
async def invoke_llm(model: ChatOpenAI, messages: list, **kwargs) -> str:
processed = _inject_no_think(messages)
response = await asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
return response.content

Two points:

  1. Hard timeout 60s (wait_for), separate from ChatOpenAI(request_timeout=90) – two layers of timeout, the 60s one fires first.
  2. _inject_no_think: When LLM_BASE_URL contains 11434 and model name includes qwen3, prepend /no_think before system message to avoid thought chain pollution.

Guide’s five inner nodes and Reporter section generation all go through this path.


5. invoke_llm_with_json: Two-Phase Parsing

ResumeParser, ProfileAnalyzer, CareerMatcher need dict; entry point is invoke_llm_with_json:

1
2
3
4
5
6
7
8
9
async def invoke_llm_with_json(model, messages, **kwargs) -> dict:
processed = _inject_no_think(messages)
try:
json_model = model.bind(response_format={"type": "json_object"})
response = await asyncio.wait_for(json_model.ainvoke(processed, **kwargs), timeout=60)
return json.loads(response.content)
except Exception:
response = await asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
return parse_json_from_text(response.content) # llm/parsers.py

Order: JSON mode → plain text + regex/bracket extraction. Part 12 will detail parse_json_from_text‘s four-layer degradation.


6. Degradation When LLM Unavailable (Workflow Layer)

workflow.py‘s run_analysis_pipeline calls check_ollama_available() before running the four-stage analysis:

  • Cache probe result every 30s;
  • Send 5-token health check to LLM_BASE_URL/chat/completions;
  • If unavailable: skip LLM, use _regex_quick_profile + _generate_fallback_report to produce rule-based report, and write Session.workflow_data (ollama_unavailable: true).

This is business-layer degradation, not internal logic of providers.py. Should be documented separately to avoid readers thinking invoke_llm auto-fallbacks.


7. Pitfalls

① Comment vs. Call Site Inconsistency
get_light_model docstring says “report formatting,” but reporter.py‘s generate_profile_section uses get_chat_model(). Either fix docs or fix code – don’t have contradictions.

② Default Model Is Not DeepSeek
Article title can mention DeepSeek deployment, but body must clarify: code defaults to OpenAI compatibility + gpt-4o, DeepSeek relies on .env.

③ JSON Mode Not Supported by All Backends
Ollama / some domestic APIs have incomplete support for response_format; bind failure will warn and fallback to text mode – in production, test the full ResumeParser chain on the target API.

④ Timeout and Retry
Reporter section generation has 2 attempts (see agents/reporter.py), but invoke_llm itself has no auto-retry; Guide nodes except returns fixed text, does not re-call the model.


8. Summary

  • LLM layer centralized in llm/providers.py, config from config.Settings.
  • Chat: Guide / Analyzer / Matcher / Reporter / Chat API; Light: currently mainly ResumeParser.
  • Unified invoke_llm (60s + Qwen3 /no_think) and invoke_llm_with_json (JSON mode → parsers).
  • Health check and rule report degradation in workflow.run_analysis_pipeline, separated from providers.
  • Documentation should be based on grep of call sites, not file header comments.

Next article: SSE streaming and WebSocket progress (api/routes/chat.py, api/ws_manager.py).


Appendix: Key Source Code (Line-by-Line Annotations)

The following code is excerpted from the iCan implementation, with Chinese annotations above each line, readable even without the public repository.
Generation command: python3 bin/build-ican-annotated-snippets.py

get_chat_model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# ========== get_chat_model ==========
# Source file: llm/providers.py Lines 85-121

# L85: Synchronous function get_chat_model: used for routing decisions or factory method
def get_chat_model() -> ChatOpenAI:
# L87: [Doc] Get the chat model (GPT-4o).
# L89: [Doc] Function description:
# L90: [Doc] Create and return a ChatOpenAI instance based on GPT-4o,
# L91: [Doc] used for core business scenarios such as conversation guidance, personal analysis,
# L92: [Doc] career matching, and planning generation.
# L93: [Doc] Model parameters are read from the global settings configuration.
# L95: [Doc] Input description:
# L96: [Doc] None (reads LLM_API_KEY, LLM_BASE_URL, LLM_MODEL_CHAT,
# L97: [Doc] LLM_TEMPERATURE, LLM_MAX_TOKENS from config)
# L99: [Doc] Output description:
# L100: [Doc] ChatOpenAI: configured chat model instance
# (Lines 86-100 are function/module docstring, converted to comments for readability)
# L101: Begin try block, except handles fallback
try:
# L102: Log for online troubleshooting of node input/output
logger.info(
# L103: Execute statement (details in business description above)
f"[get_chat_model] Starting, input: none, "
# L104: Execute statement (details in business description above)
f"using model: {settings.LLM_MODEL_CHAT}, "
# L105: Execute statement (details in business description above)
f"BASE_URL: {settings.LLM_BASE_URL}"
# L106: Execute statement (details in business description above)
)

# L108: LangChain OpenAI compatible client; change DeepSeek/Ollama by changing base_url/model only
model = ChatOpenAI(
# L109: Assignment: update local variable or state field
api_key=settings.LLM_API_KEY,
# L110: Assignment: update local variable or state field
base_url=settings.LLM_BASE_URL,
# L111: Assignment: update local variable or state field
model=settings.LLM_MODEL_CHAT,
# L112: Assignment: update local variable or state field
temperature=settings.LLM_TEMPERATURE,
# L113: Assignment: update local variable or state field
max_tokens=settings.LLM_MAX_TOKENS,
# L114: Assignment: update local variable or state field
request_timeout=90,
# L115: Execute statement (details in business description above)
)

# L117: Log for online troubleshooting of node input/output
logger.info(
# L118: Execute statement (details in business description above)
f"[get_chat_model] Completed, returning: ChatOpenAI instance, "
# L119: Execute statement (details in business description above)
f"model: {settings.LLM_MODEL_CHAT}"
# L120: Execute statement (details in business description above)
)
# L121: Return field to merge into state (LangGraph will merge)
return model

invoke_llm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# ========== invoke_llm ==========
# Source file: llm/providers.py Lines 171-205

# L171: Async function invoke_llm: can be awaited, suitable for IO-bound LLM/DB calls
async def invoke_llm(model: ChatOpenAI, messages: list, **kwargs) -> str:
# L172: Begin try block, except handles fallback
try:
# L173: Log for online troubleshooting of node input/output
logger.info(
# L174: Assignment: update local variable or state field
f"[invoke_llm] Starting, input: model={model.model_name}, "
# L175: Execute statement (details in business description above)
f"messages count: {len(messages)}, kwargs: {kwargs}"
# L176: Execute statement (details in business description above)
)
# L177: Log for online troubleshooting of node input/output
logger.debug(f"[invoke_llm] Message details: {messages}")

# L179: Assignment: update local variable or state field
processed = _inject_no_think(messages)

# L181: Import dependency module
import asyncio
# L182: Begin try block, except handles fallback
try:
# L183: Hard timeout wrapper to prevent LLM hanging
response = await asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
# L184: Catch exception to avoid crashing the whole graph/request
except asyncio.TimeoutError:
# L185: Log for online troubleshooting of node input/output
logger.error("[invoke_llm] LLM call timed out (60s)")
# L186: Re-raise exception, handled by caller or LangGraph
raise TimeoutError("AI model response timed out, please retry later")

# L188: Assignment: update local variable or state field
result = response.content

# L190: Log for online troubleshooting of node input/output
logger.info(
# L191: Execute statement (details in business description above)
f"[invoke_llm] Completed, returned text length: {len(result) if result else 0}"
# L192: Execute statement (details in business description above)
)
# L193: Log for online troubleshooting of node input/output
logger.debug(f"[invoke_llm] Return content preview: {result[:200] if result else 'empty'}")

# L195: Return field to merge into state (LangGraph will merge)
return result

# L197: Catch exception to avoid crashing the whole graph/request
except TimeoutError:
# L198: Re-raise exception, handled by caller or LangGraph
raise
# L199: Catch exception to avoid crashing the whole graph/request
except Exception as e:
# L200: Log for online troubleshooting of node input/output
logger.error(
# L201: Assignment: update local variable or state field
f"[invoke_llm] Exception calling LLM, model={getattr(model, 'model_name', 'unknown')}, "
# L202: Execute statement (details in business description above)
f"exception: {e}",
# L203: Assignment: update local variable or state field
exc_info=True
# L204: Execute statement (details in business description above)
)
# L205: Re-raise exception, handled by caller or LangGraph
raise

invoke_llm_with_json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# ========== invoke_llm_with_json ==========
# Source file: llm/providers.py Lines 208-278

# L208: Async function invoke_llm_with_json: can be awaited, suitable for IO-bound LLM/DB calls
async def invoke_llm_with_json(model: ChatOpenAI, messages: list, **kwargs) -> dict:
# L210: [Doc] Call LLM and parse JSON output.
# L212: [Doc] Function description:
# L213: [Doc] Use the specified ChatOpenAI model instance, pass message list to call LLM asynchronously,
# L214: [Doc] request model to reply in JSON format, and automatically parse reply content to Python dict.
# L215: [Doc] Suitable for scenarios requiring structured data output, such as resume parsing, career matching results.
# L216: [Doc] Prefer response_format JSON mode, fallback to text parsing if unsupported.
# L218: [Doc] Input description:
# L219: [Doc] model (ChatOpenAI): configured ChatOpenAI model instance
# L220: [Doc] messages (list): message list, format [{"role": "system/user/assistant", "content": "..."}]
# L221: [Doc] **kwargs: extra parameters, e.g., temperature, max_tokens override defaults
# L223: [Doc] Output description:
# L224: [Doc] dict: parsed JSON dictionary data
# (Lines 209-225 are function/module docstring, converted to comments for readability)
# L226: Import dependency module
import json

# L228: Import dependency module
from ican.llm.parsers import parse_json_from_text

# L230: Begin try block, except handles fallback
try:
# L231: Log for online troubleshooting of node input/output
logger.info(
# L232: Call LLM and parse JSON; internally has JSON mode → text degradation chain
f"[invoke_llm_with_json] Starting, input: model={model.model_name}, "
# L233: Execute statement (details in business description above)
f"messages count: {len(messages)}, kwargs: {kwargs}"
# L234: Execute statement (details in business description above)
)
# L235: Call LLM and parse JSON; internally has JSON mode → text degradation chain
logger.debug(f"[invoke_llm_with_json] Message details: {messages}")

# L237: Assignment: update local variable or state field
processed = _inject_no_think(messages)
# L238: Assignment: update local variable or state field
raw_content = None

# L240: Import dependency module
import asyncio as _asyncio

# L242: Begin try block, except handles fallback
try:
# L243: Attempt OpenAI JSON mode, fallback to except if unsupported
json_model = model.bind(response_format={"type": "json_object"})
# L244: Begin try block, except handles fallback
try:
# L245: Hard timeout wrapper to prevent LLM hanging
response = await _asyncio.wait_for(json_model.ainvoke(processed, **kwargs), timeout=60)
# L246: Catch exception to avoid crashing the whole graph/request
except _asyncio.TimeoutError:
# L247: Re-raise exception, handled by caller or LangGraph
raise TimeoutError("AI model response timed out, please retry later")
# L248: Assignment: update local variable or state field
raw_content = response.content
# L249: Catch exception to avoid crashing the whole graph/request
except TimeoutError:
# L250: Re-raise exception, handled by caller or LangGraph
raise
# L251: Catch exception to avoid crashing the whole graph/request
except Exception as bind_err:
# L252: Log for online troubleshooting of node input/output
logger.warning(
# L253: Call LLM and parse JSON; internally has JSON mode → text degradation chain
f"[invoke_llm_with_json] response_format JSON mode not supported, falling back to text mode: {bind_err}"
# L254: Execute statement (details in business description above)
)
# L255: Begin try block, except handles fallback
try:
# L256: Hard timeout wrapper to prevent LLM hanging
response = await _asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
# L257: Catch exception to avoid crashing the whole graph/request
except _asyncio.TimeoutError:
# L258: Re-raise exception, handled by caller or LangGraph
raise TimeoutError("AI model response timed out, please retry later")
# L259: Assignment: update local variable or state field
raw_content = response.content

# L261: Log for online troubleshooting of node input/output
logger.debug(
# L262: Call LLM and parse JSON; internally has JSON mode → text degradation chain
f"[invoke_llm_with_json] Raw reply length: {len(raw_content) if raw_content else 0}"
# L263: Execute statement (details in business description above)
)

# L265: Begin try block, except handles fallback
try:
# L266: Parse LLM response string to Python dict
result = json.loads(raw_content)
# L267: Catch exception to avoid crashing the whole graph/request
except (json.JSONDecodeError, TypeError):
# L268: Call LLM and parse JSON; internally has JSON mode → text degradation chain
logger.info("[invoke_llm_with_json] Direct JSON parsing failed, trying parse_json_from_text extraction")
# L269: Extract JSON from LLM text (four layers of regex/parsing strategy)
result = parse_json_from_text(raw_content)
# L270: Conditional branch
if not result:
# L271: Re-raise exception, handled by caller or LangGraph
raise ValueError(f"Cannot extract valid JSON from LLM reply, raw content: {raw_content[:300]}")

# L273: Log for online troubleshooting of node input/output
logger.info(
# L274: Call LLM and parse JSON; internally has JSON mode → text degradation chain
f"[invoke_llm_with_json] Completed, returned JSON field count: {len(result)}"
# L275: Execute statement (details in business description above)
)
# L276: Call LLM and parse JSON; internally has JSON mode → text degradation chain
logger.debug(f"[invoke_llm_with_json] Return JSON preview: {str(result)[:300]}")

# L278: Return field to merge into state (LangGraph will merge)
return result

Series Navigation

Article Topic
1 System Overview
2 Five-Agent Collaboration
3 Holland RIASEC
4–7 State · Routing · Nesting · Fault Tolerance
8–11 LLM Layer · SSE/WS · DB Migration · PDF
12–14 JSON Prompt · RIASEC Prompt · Guide Prompt
15–17 Docker · Middleware · Config

← Back to iCan Topic