0. Series Loop (Follow Along Without Public Source Code)

End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (Parse → Analyze → Match → Report) → tools/pdf_exporter PDF.
This Article: 12/17 · Structure Loop · JSON

Stage User Visible Code Entry Corresponding Article
Create Session Welcome Message POST /api/sessions 09
Multi-turn Dialogue SSE Streaming chat/stream → run_guide_single_turn 06, 14
Information Sufficient Start Analysis _run_analysis_background 05, 07
Resume Parsing Progress 30% run_resume_parser 12
Profile/RIASEC Progress 50% run_profile_analyzer 03, 13
Career Matching Progress 70% run_career_matcher 02
Report Progress 90% run_reporter 11
Download PDF File GET …/report/pdf 11, 15
Description
Before reading this Article 08: invoke_llm_with_json
After reading this Manually work through parse_json_from_text four-layer strategy
Next loop Article 03/13: Business JSON schema (Article 13)

Full Series Loop Index: SERIES-LOOP.md

1. What Problem to Solve

In the iCan main flow, resume_parser_node needs to convert the natural language resume collected during the Guide phase into a structured_profile for profile_analyzer_node to consume. The input is unstructured text, and the output must be a dict with a fixed schema.

Common failure modes during actual integration:

  • The model wraps JSON inside a ` json code block, or mixes it directly with explanatory text;
  • Ollama local models do not support response_format={"type": "json_object"}, causing bind to throw an error;
  • Minor JSON syntax issues (trailing commas, single quotes), causing json.loads to fail directly;
  • LLM returns an empty dict on both calls, breaking the entire parsing pipeline.

iCan’s strategy is Prompt constraint schema + JSON mode at the call layer + four-layer text extraction + regex fallback, rather than expecting the model to “get it perfect in one shot.”


2. Implementation Location

Module Responsibility
llm/prompts.py RESUME_PARSER_SYSTEM_PROMPT: Complete JSON example + field rules
llm/providers.py invoke_llm_with_json: response_format first, fallback on failure
llm/parsers.py parse_json_from_text four-layer extraction; validate_structured_profile validation
agents/resume_parser.py Assemble messages, select get_light_model(), retry and _regex_extract_profile fallback

Subgraph order (create_resume_parser_graph): load_input → extract_information → build_profile → validate_profile.


JSON Four-Layer Fallback Parsing


3. Prompt Design: ResumeParser’s Schema Contract

The Prompt is defined in RESUME_PARSER_SYSTEM_PROMPT in llm/prompts.py. The core is not a single sentence “please output JSON,” but four things clearly stated at once:

  1. Complete Example: Shows all fields: basic_info, work_experience, skill_set, certifications, career_progression, parsing_confidence;
  2. Missing Strategy: “If not mentioned, use null, do not fabricate”;
  3. Inference Annotation: parsing_confidence.inferred_fields lists inferred fields;
  4. Chinese and Format: Differentiate technical/soft skills, integrate and deduplicate across multi-turn dialogue.

The Prompt embeds a complete example wrapped in ```json—this aligns with the regex r"```json\s*([\s\S]*?)\s*```" in llm/parsers.py strategy 1: if the model follows the Prompt and outputs a code block, the parser hits it on the first layer.

extract_information in agents/resume_parser.py combines the system prompt and user’s original text into messages:

1
2
3
4
5
6
messages = [
{"role": "system", "content": RESUME_PARSER_SYSTEM_PROMPT},
{"role": "user", "content": f"Please extract structured personal information from the following text:\n\n{document_content}"},
]
model = get_light_model()
parsed_data = await invoke_llm_with_json(model, messages)

Model Selection: Resume parsing uses get_light_model() (code defaults to LLM_MODEL_LIGHT=gpt-4o-mini), not the chat model. In .env, it’s common to change to DeepSeek or Ollama qwen3.5:9b inside Docker—switching models does not affect Prompt/schema, but affects JSON mode compatibility (see pitfalls).


4. Call Layer: invoke_llm_with_json Dual Channel

In llm/providers.py, JSON invocation is not a simple ainvoke, but a three-level fallback:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
try:
json_model = model.bind(response_format={"type": "json_object"})
response = await asyncio.wait_for(json_model.ainvoke(processed, **kwargs), timeout=60)
raw_content = response.content
except Exception as bind_err:
logger.warning("response_format JSON mode not supported, falling back to text mode: %s", bind_err)
response = await asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
raw_content = response.content

try:
result = json.loads(raw_content)
except (json.JSONDecodeError, TypeError):
result = parse_json_from_text(raw_content) # llm/parsers.py
if not result:
raise ValueError(f"Cannot extract valid JSON from LLM response, original content: {raw_content[:300]}")

The flow can be summarized as:

1
2
3
4
5
6
7
bind(json_object) → json.loads(content)
↓ unsupported or parse failure
normal ainvoke → json.loads
↓ still fails
parse_json_from_text (four layers)
↓ empty dict
ValueError / upstream retry

Additionally, when LLM_BASE_URL contains 11434 and the model name contains qwen3, _inject_no_think prepends /no_think before the system message to avoid Qwen3 thinking blocks contaminating the JSON—an extra layer of JSON stability on local Ollama.


5. Four-Layer Fallback Parser: parse_json_from_text

parse_json_from_text in llm/parsers.py is the last net, trying in order:

Strategy Regex/Logic Typical Scenario
1 r"```json\s*([\s\S]*?)\s*```" ChatGPT style output
2 Normal ``` ... ```, content starts with { or [ Code block without json label
3 r"\{[\s\S]*\}" greedy match outermost braces “Okay, the result is: {…}”
4 json.loads(text.strip()) Pure JSON response
Fallback Return {} Completely unparseable

Unlike general tutorials, each layer failure does not throw up in the implementation—if a strategy’s json.loads fails, it moves to the next layer; the outermost JSONDecodeError is caught and returns {}. This means the caller must check for empty dict—invoke_llm_with_json will then raise ValueError, and extract_information will enter retry or regex fallback.

In the source code, each layer has logger.info annotating the strategy number (Strategy 1–4). During debugging, you can check the logs to see which layer was reached.


6. Agent Retry and Regex Fallback

extract_information in agents/resume_parser.py has business retry on top of the LLM layer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for attempt in range(2):
try:
model = get_light_model()
parsed_data = await invoke_llm_with_json(model, messages)
if parsed_data and len(parsed_data) > 0:
break
logger.warning("[extract_information] Attempt %d returned empty data, retrying", attempt + 1)
except TimeoutError as te:
...
except Exception as e:
...

if not parsed_data or len(parsed_data) == 0:
parsed_data = _regex_extract_profile(document_content)

_regex_extract_profile uses regex to extract name, education, work experience, etc.—field names are not identical to the Prompt schema (e.g., it produces skills instead of skill_set.technical_skills). build_profile fills missing keys with default empty structures—intentionally “something is better than nothing,” but validate_profile will likely still report missing required fields.


7. Quality Closure After Parsing

The LLM’s self-evaluated parsing_confidence is extracted into confidence_scores in build_profile; validate_profile calls validate_structured_profile from llm/parsers.py for code-side validation. Required fields include:

  • basic_info.education, basic_info.major
  • Non-empty work_experience list
  • skill_set.technical_skills, skill_set.soft_skills
  • career_progression.total_years

Missing items are written into parse_errors, and validation_passed is written into confidence_scores. The confidence from the Prompt and the Python validation are complementary: the former reflects the model’s self-assessment, the latter ensures downstream Agents do not receive a “skeleton profile.”


8. Position in the Pipeline

In the top-level workflow.py: when guide_node has enough information, it enters resume_parser_node, which outputs structured_profile written to iCanWorkflowState, then passes it to profile_analyzer_node.

Data flow:

1
2
3
4
5
Guide conversation text (raw_input)
→ run_resume_parser
→ invoke_llm_with_json + parse_json_from_text
→ structured_profile + confidence_scores + parse_errors
→ ProfileAnalyzer

The same invoke_llm_with_json + parse_json_from_text is also reused by other nodes needing JSON, such as ProfileAnalyzer, CareerMatcher, etc. (see Article 8 LLM layer); ResumeParser is the call point with the most complex schema and longest fallback chain.


9. Pitfalls and Boundaries

Pitfall 1: response_format is not a universal capability. Some Ollama models fail on bind and fall back to text mode, relying more on the JSON example in the Prompt and parse_json_from_text. When integrating with Docker default qwen3.5:9b, check logs for the “falling back to text mode” warning.

Pitfall 2: Strategy 3 greedy match may cut incorrectly. \{[\s\S]*\} goes from the first { to the last }. If the model embeds other curly braces before or after the JSON, the whole parse may fail and fall into {}. The Prompt requirement “output only JSON” is still necessary; the parser cannot replace Prompt constraints.

Pitfall 3: Regex fallback and schema misalignment. _regex_extract_profile produces fields like skills, which are not automatically mapped to skill_set.technical_skills. Downstream validation failure is expected behavior—guide the user to supplement information or retry the LLM, rather than treating the fallback as a successful parse.

Pitfall 4: Empty dict and retry. extract_information attempts at most 2 times; if invoke_llm_with_json returns an empty dict (without throwing an exception), it logs a warning and retries. TimeoutError is caught separately and does not block indefinitely.


10. Summary

  1. The Prompt locks the schema with a complete JSON example + null/inference rules, defined in llm/prompts.py.
  2. invoke_llm_with_json in llm/providers.py first tries json_object mode, then normal call, then json.loadsparse_json_from_text.
  3. parse_json_from_text in llm/parsers.py is four-layer fallback; returns {} on failure; callers must handle empty results.
  4. extract_information in agents/resume_parser.py uses get_light_model() with 2 retries + _regex_extract_profile as the final fallback.
  5. validate_structured_profile validates required fields with code rules, parallel to parsing_confidence self-evaluation.

Next article: RIASEC assessment Prompt engineering (Article 13).


Appendix: Key Source Code (Line-by-Line Annotations)

The following code is from the iCan implementation. Chinese annotations are above each line, allowing you to follow along without the public repository.
Generation command: python3 bin/build-ican-annotated-snippets.py

parse_json_from_text Four-Layer Strategy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# ========== parse_json_from_text Four-Layer Strategy ==========
# Source file: llm/parsers.py Lines 19-92

# L19: Synchronous function parse_json_from_text: routing decision or factory method
def parse_json_from_text(text: str) -> dict:
# L21: [Doc] Extract JSON from LLM response text.
# L23: [Doc] Function description:
# L24: [Doc] Extracts and parses JSON content from text returned by LLM. Supports the following formats:
# L25: [Doc] 1. Markdown code block wrapped JSON (```json ... ```)
# L26: [Doc] 2. Normal code block wrapped JSON (``` ... ```)
# L27: [Doc] 3. JSON embedded directly in text (starting with {, ending with })
# L28: [Doc] Returns an empty dictionary on parsing failure.
# L30: [Doc] Input description:
# L31: [Doc] text (str): Raw text from LLM response
# L33: [Doc] Output description:
# L34: [Doc] dict: Parsed JSON dictionary, returns empty dict {} on failure
# (Lines L20-35 are function/module docstrings, converted to comments for readability)
# L36: Start try block, subsequent except handles fallback
try:
# L37: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.info(f"[parse_json_from_text] Starting execution, input: text length={len(text)}")
# L38: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.debug(f"[parse_json_from_text] Text preview: {text[:300]}")

# L40: Conditional branch
if not text or not text.strip():
# L41: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.warning("[parse_json_from_text] Input text is empty, returning empty dict")
# L42: Return fields to be merged into state (LangGraph will merge)
return {}

# L44: Strategy 1: Try to extract from ```json ... ``` code block
# L45: Assignment: update local variable or state field
json_code_block_pattern = r"```json\s*([\s\S]*?)\s*```"
# L46: Assignment: update local variable or state field
match = re.search(json_code_block_pattern, text)
# L47: Conditional branch
if match:
# L48: Assignment: update local variable or state field
json_str = match.group(1).strip()
# L49: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.debug(f"[parse_json_from_text] Extracted content from json code block, length: {len(json_str)}")
# L50: Parse LLM returned string into Python dict
result = json.loads(json_str)
# L51: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.info(f"[parse_json_from_text] Execution complete (Strategy 1: json code block), returned field count: {len(result)}")
# L52: Return fields to be merged into state (LangGraph will merge)
return result

# L54: Strategy 2: Try to extract from normal ``` ... ``` code block
# L55: Assignment: update local variable or state field
code_block_pattern = r"```\s*([\s\S]*?)\s*```"
# L56: Assignment: update local variable or state field
match = re.search(code_block_pattern, text)
# L57: Conditional branch
if match:
# L58: Assignment: update local variable or state field
inner = match.group(1).strip()
# L59: Try to determine if it is JSON (starts with { or [)
# L60: Conditional branch
if inner.startswith("{") or inner.startswith("["):
# L61: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.debug(f"[parse_json_from_text] Extracted JSON content from normal code block, length: {len(inner)}")
# L62: Parse LLM returned string into Python dict
result = json.loads(inner)
# L63: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.info(f"[parse_json_from_text] Execution complete (Strategy 2: normal code block), returned field count: {len(result)}")
# L64: Return fields to be merged into state (LangGraph will merge)
return result

# L66: Strategy 3: Try to find JSON directly in the text (find outermost { })
# L67: Assignment: update local variable or state field
brace_pattern = r"\{[\s\S]*\}"
# L68: Assignment: update local variable or state field
match = re.search(brace_pattern, text)
# L69: Conditional branch
if match:
# L70: Assignment: update local variable or state field
json_str = match.group(0)
# L71: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.debug(f"[parse_json_from_text] Extracted JSON content directly from text, length: {len(json_str)}")
# L72: Parse LLM returned string into Python dict
result = json.loads(json_str)
# L73: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.info(f"[parse_json_from_text] Execution complete (Strategy 3: direct extraction), returned field count: {len(result)}")
# L74: Return fields to be merged into state (LangGraph will merge)
return result

# L76: Strategy 4: Try to parse the entire text directly
# L77: Start try block, subsequent except handles fallback
try:
# L78: Parse LLM returned string into Python dict
result = json.loads(text.strip())
# L79: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.info(f"[parse_json_from_text] Execution complete (Strategy 4: direct parse), returned field count: {len(result)}")
# L80: Return fields to be merged into state (LangGraph will merge)
return result
# L81: Catch exception to avoid crashing the entire graph/request
except json.JSONDecodeError:
# L82: Execute statement (details in business description above)
pass

# L84: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.warning("[parse_json_from_text] Could not extract valid JSON from text, returning empty dict")
# L85: Return fields to be merged into state (LangGraph will merge)
return {}

# L87: Catch exception to avoid crashing the entire graph/request
except json.JSONDecodeError as e:
# L88: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.error(f"[parse_json_from_text] JSON parse failed, exception: {e}", exc_info=True)
# L89: Return fields to be merged into state (LangGraph will merge)
return {}
# L90: Catch exception to avoid crashing the entire graph/request
except Exception as e:
# L91: Extract JSON from LLM text (four-layer regex/parse strategy)
logger.error(f"[parse_json_from_text] Exception during JSON extraction: {e}", exc_info=True)
# L92: Return fields to be merged into state (LangGraph will merge)
return {}

invoke_llm_with_json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# ========== invoke_llm_with_json ==========
# Source file: llm/providers.py Lines 208-278

# L208: Asynchronous function invoke_llm_with_json: can be awaited, suitable for IO-type LLM/DB calls
async def invoke_llm_with_json(model: ChatOpenAI, messages: list, **kwargs) -> dict:
# L210: [Doc] Call LLM and parse JSON output.
# L212: [Doc] Function description:
# L213: [Doc] Uses the specified ChatOpenAI model instance, asynchronously calls the LLM with the message list,
# L214: [Doc] requires the model to reply in JSON format, and automatically parses the response content into a Python dictionary.
# L215: [Doc] Suitable for scenarios requiring structured data output, such as resume parsing, career matching results, etc.
# L216: [Doc] Prefers to use response_format JSON mode, falls back to text parsing if not supported.
# L218: [Doc] Input description:
# L219: [Doc] model (ChatOpenAI): Configured ChatOpenAI model instance
# L220: [Doc] messages (list): Message list, format [{"role": "system/user/assistant", "content": "..."}]
# L221: [Doc] **kwargs: Extra parameters, such as temperature, max_tokens, etc. to override default configuration
# L223: [Doc] Output description:
# L224: [Doc] dict: Parsed JSON dictionary data
# (Lines L209-225 are function/module docstrings, converted to comments for readability)
# L226: Import dependency module
import json

# L228: Import dependency module
from ican.llm.parsers import parse_json_from_text

# L230: Start try block, subsequent except handles fallback
try:
# L231: Log for online debugging of node input/output
logger.info(
# L232: Call LLM and parse JSON; internal JSON mode → text fallback chain
f"[invoke_llm_with_json] Starting execution, input: model={model.model_name},"
# L233: Execute statement (details in business description above)
f"message count: {len(messages)}, kwargs: {kwargs}"
# L234: Execute statement (details in business description above)
)
# L235: Call LLM and parse JSON; internal JSON mode → text fallback chain
logger.debug(f"[invoke_llm_with_json] Message details: {messages}")

# L237: Assignment: update local variable or state field
processed = _inject_no_think(messages)
# L238: Assignment: update local variable or state field
raw_content = None

# L240: Import dependency module
import asyncio as _asyncio

# L242: Start try block, subsequent except handles fallback
try:
# L243: Try OpenAI JSON mode, if not supported go to except fallback
json_model = model.bind(response_format={"type": "json_object"})
# L244: Start try block, subsequent except handles fallback
try:
# L245: Hard timeout wrapper, prevent LLM from hanging
response = await _asyncio.wait_for(json_model.ainvoke(processed, **kwargs), timeout=60)
# L246: Catch exception to avoid crashing the entire graph/request
except _asyncio.TimeoutError:
# L247: Raise exception up, handled by caller or LangGraph
raise TimeoutError("AI model response timed out, please retry later")
# L248: Assignment: update local variable or state field
raw_content = response.content
# L249: Catch exception to avoid crashing the entire graph/request
except TimeoutError:
# L250: Raise exception up, handled by caller or LangGraph
raise
# L251: Catch exception to avoid crashing the entire graph/request
except Exception as bind_err:
# L252: Log for online debugging of node input/output
logger.warning(
# L253: Call LLM and parse JSON; internal JSON mode → text fallback chain
f"[invoke_llm_with_json] response_format JSON mode not supported, falling back to text mode: {bind_err}"
# L254: Execute statement (details in business description above)
)
# L255: Start try block, subsequent except handles fallback
try:
# L256: Hard timeout wrapper, prevent LLM from hanging
response = await _asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
# L257: Catch exception to avoid crashing the entire graph/request
except _asyncio.TimeoutError:
# L258: Raise exception up, handled by caller or LangGraph
raise TimeoutError("AI model response timed out, please retry later")
# L259: Assignment: update local variable or state field
raw_content = response.content

# L261: Log for online debugging of node input/output
logger.debug(
# L262: Call LLM and parse JSON; internal JSON mode → text fallback chain
f"[invoke_llm_with_json] Raw response length: {len(raw_content) if raw_content else 0}"
# L263: Execute statement (details in business description above)
)

# L265: Start try block, subsequent except handles fallback
try:
# L266: Parse LLM returned string into Python dict
result = json.loads(raw_content)
# L267: Catch exception to avoid crashing the entire graph/request
except (json.JSONDecodeError, TypeError):
# L268: Call LLM and parse JSON; internal JSON mode → text fallback chain
logger.info("[invoke_llm_with_json] Direct JSON parse failed, trying parse_json_from_text extraction")
# L269: Extract JSON from LLM text (four-layer regex/parse strategy)
result = parse_json_from_text(raw_content)
# L270: Conditional branch
if not result:
# L271: Raise exception up, handled by caller or LangGraph
raise ValueError(f"Cannot extract valid JSON from LLM response, original content: {raw_content[:300]}")

# L273: Log for online debugging of node input/output
logger.info(
# L274: Call LLM and parse JSON; internal JSON mode → text fallback chain
f"[invoke_llm_with_json] Execution complete, returned JSON field count: {len(result)}"
# L275: Execute statement (details in business description above)
)
# L276: Call LLM and parse JSON; internal JSON mode → text fallback chain
logger.debug(f"[invoke_llm_with_json] Returned JSON preview: {str(result)[:300]}")

# L278: Return fields to be merged into state (LangGraph will merge)
return result

extract_information

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# ========== extract_information ==========
# Source file: agents/resume_parser.py Lines 153-225

# L153: Asynchronous function extract_information: can be awaited, suitable for IO-type LLM/DB calls
async def extract_information(state: ResumeParserState) -> dict:
# L155: [Doc] Use LLM to extract structured information.
# L157: [Doc] Function description:
# L158: [Doc] Sends the user's text content to the LLM, following the format requirements defined in
# L159: [Doc] RESUME_PARSER_SYSTEM_PROMPT, extracting structured personal information including basic info,
# L160: [Doc] work experience, skill set, certifications, and career development path.
# L162: [Doc] Input description:
# L163: [Doc] state (ResumeParserState): Resume parsing state object, must contain document_content.
# L165: [Doc] Output description:
# L166: [Doc] dict: State update dictionary, containing parsed_sections (structured data from LLM).
# (Lines L154-167 are function/module docstrings, converted to comments for readability)
# L168: Start try block, subsequent except handles fallback
try:
# L169: Log for online debugging of node input/output
logger.info("[extract_information] Starting execution, input: state=%s", {k: str(v)[:100] for k, v in state.items()})
# L170: Assignment: update local variable or state field
document_content = state.get("document_content", "")

# L172: Conditional branch
if not document_content or not document_content.strip():
# L173: Log for online debugging of node input/output
logger.warning("[extract_information] Document content is empty, skipping extraction")
# L174: Return fields to be merged into state (LangGraph will merge)
return {
# L175: Execute statement (details in business description above)
"parsed_sections": {},
# L176: Execute statement (details in business description above)
"parse_errors": ["Document content is empty, cannot extract information"],
# L177: Execute statement (details in business description above)
}

# L179: Assignment: update local variable or state field
messages = [
# L180: Execute statement (details in business description above)
{"role": "system", "content": RESUME_PARSER_SYSTEM_PROMPT},
# L181: Execute statement (details in business description above)
{"role": "user", "content": f"Please extract structured personal information from the following text:\n\n{document_content}"},
# L182: Execute statement (details in business description above)
]

# L184: Log for online debugging of node input/output
logger.info("[extract_information] Calling LLM to extract structured information, document length: %d", len(document_content))

# L186: Assignment: update local variable or state field
parsed_data = {}
# L187: Assignment: update local variable or state field
last_err = None
# L188: Loop
for attempt in range(2):
# L189: Start try block, subsequent except handles fallback
try:
# L190: Get light model instance (mainly used for resume_parser structured JSON)
model = get_light_model()
# L191: Call LLM and parse JSON; internal JSON mode → text fallback chain
parsed_data = await invoke_llm_with_json(model, messages)
# L192: Conditional branch
if parsed_data and len(parsed_data) > 0:
# L193: Execute statement (details in business description above)
break
# L194: Log for online debugging of node input/output
logger.warning("[extract_information] Attempt %d returned empty data, retrying", attempt + 1)
# L195: Catch exception to avoid crashing the entire graph/request
except TimeoutError as te:
# L196: Assignment: update local variable or state field
last_err = te
# L197: Log for online debugging of node input/output
logger.warning("[extract_information] LLM call timeout on attempt %d: %s", attempt + 1, te)
# L198: Catch exception to avoid crashing the entire graph/request
except Exception as e:
# L199: Assignment: update local variable or state field
last_err = e
# L200: Log for online debugging of node input/output
logger.warning("[extract_information] LLM call exception on attempt %d: %s", attempt + 1, e)

# L202: Conditional branch
if not parsed_data or len(parsed_data) == 0:
# L203: Log for online debugging of node input/output
logger.warning("[extract_information] LLM extraction failed, using regex fallback")
# L204: Assignment: update local variable or state field
parsed_data = _regex_extract_profile(document_content)

# L206: Log for online debugging of node input/output
logger.info("[extract_information] Structured data field count: %d", len(parsed_data))
# L207: Log for online debugging of node input/output
logger.debug("[extract_information] Structured data preview: %s", json.dumps(parsed_data, ensure_ascii=False)[:500])

# L209: Assignment: update local variable or state field
result = {
# L210: Execute statement (details in business description above)
"parsed_sections": parsed_data,
# L211: Execute statement (details in business description above)
}
# L212: Log for online debugging of node input/output
logger.info("[extract_information] Execution complete, output: parsed_sections field count=%d", len(parsed_data))
# L213: Return fields to be merged into state (LangGraph will merge)
return result

# L215: Catch exception to avoid crashing the entire graph/request
except Exception as e:
# L216: Log for online debugging of node input/output
logger.error("[extract_information] Exception extracting structured information with LLM: %s", e, exc_info=True)
# L217: Assignment: update local variable or state field
fallback = _regex_extract_profile(state.get("document_content", ""))
# L218: Conditional branch
if fallback:
# L219: Log for online debugging of node input/output
logger.info("[extract_information] Extracted %d fields using regex fallback", len(fallback))
# L220: Return fields to be merged into state (LangGraph will merge)
return {"parsed_sections": fallback}
# L221: Return fields to be merged into state (LangGraph will merge)
return {
# L222: Execute statement (details in business description above)
"parsed_sections": {},
# L223: Execute statement (details in business description above)
"parse_errors": [f"LLM extraction exception: {str(e)}"],
# L224: Execute statement (details in business description above)
}

Series Navigation

Article Topic
1 System Overview
2 Five Agent Collaboration
3 Holland RIASEC
4–7 State · Routing · Nesting · Fault Tolerance
8–11 LLM Layer · SSE/WS · DB Migration · PDF
12–14 JSON Prompt · RIASEC Prompt · Guide Prompt
15–17 Docker · Middleware · Configuration

← Back to iCan Topic