0. Series Loop (Follow Along Without Open Source Code)
End-to-End Chain: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pipeline (parse→analyze→match→report) → tools/pdf_exporter PDF.
This article: 7/17 · Error Handling Loop · No Crash
| Stage | User Visible | Code Entry | Article |
|---|---|---|---|
| Create session | Welcome message | POST /api/sessions | 09 |
| Multi-turn conversation | SSE streaming | chat/stream → run_guide_single_turn | 06, 14 |
| Info sufficient | Start analysis | _run_analysis_background | 05, 07 |
| Resume parsing | Progress 30% | run_resume_parser | 12 |
| Profile/RIASEC | Progress 50% | run_profile_analyzer | 03, 13 |
| Career matching | Progress 70% | run_career_matcher | 02 |
| Report | Progress 90% | run_reporter | 11 |
| Download PDF | File | GET …/report/pdf | 11, 15 |
| Description | |
|---|---|
| Before reading | Article 05 routing, Article 08 LLM calls |
| After reading | List fallback chains when Ollama is unavailable |
| Next loop | Article 09: Background task _run_analysis_background (Article 8) |
Full series loop index: SERIES-LOOP.md
1. What Problem to Solve
The iCan top-level workflow chains 5 LLM-dependent nodes (Guide → ResumeParser → ProfileAnalyzer → CareerMatcher → Reporter). If any step times out, returns invalid JSON, or Ollama/cloud API goes down, without isolation the entire analysis would return 500, wasting user-entered conversations.
The project implements error handling at three levels:
- Before invocation:
llm/providers.py‘scheck_ollama_availableprobes LLM reachability; - During invocation:
invoke_llm/invoke_llm_with_jsonwith 60s timeout +llm/parsers.pymulti-strategy JSON extraction; - After invocation: try/except on each node in
workflow.py, plusrun_analysis_pipeline‘s phased catch and_generate_fallback_reportrule-engine fallback.
2. Strategy 1: Health Check + 30-Second Cache
Before running the four analysis agents, run_analysis_pipeline first calls check_ollama_available() (function name is legacy; it actually probes the OpenAI-compatible /chat/completions at settings.LLM_BASE_URL, not limited to Ollama).
1 | |
Design highlights:
- 30-second cache: Avoids probing requests per session, reducing latency and quota overhead;
max_tokens=5: Minimizes probe cost;- Failure writes cache False: Subsequent 30 seconds quickly fall back without repeated timeouts.
When unavailable, workflow.py‘s run_analysis_pipeline skips the four LLM agents, switches to _regex_quick_profile + _generate_fallback_report, and marks ollama_unavailable: True in the DB.
3. Strategy 2: asyncio.wait_for Hard Timeout
llm/providers.py‘s invoke_llm wraps all Chat calls with a 60-second upper limit:
1 | |
On timeout, raises TimeoutError("AI model response timeout, please retry later"). get_chat_model() also has request_timeout=90 (HTTP layer); 60s is earlier cutoff at the application layer.
The API layer in api/routes/chat.py wraps another 90-second wait_for around run_guide_chat, offering users a friendlier “please retry later” message instead of a bare 500.
Empirical ranges (not hard rules): normal replies 2–5s, ProfileAnalyzer 10–30s, Reporter chapter generation may take 30–50s; over 60s is treated as abnormal.
4. Strategy 3: JSON Four-Level Degradation Parsing
Structured agents (ResumeParser, CareerMatcher, etc.) go through invoke_llm_with_json: first tries response_format=json_object, falls back to plain text if unsupported, then uses llm/parsers.py‘s parse_json_from_text:
1 | |
parse_json_from_text catches any JSONDecodeError and returns {}, ensuring the upstream always gets a dict. invoke_llm_with_json still raises ValueError on {} — that’s for “business must have JSON” scenarios, a different responsibility from the parser’s “try to extract”.
5. Strategy 4: Node-Level Exception Isolation
In workflow.py, the five top-level nodes each have try/except. On failure, they don’t raise but write safe defaults, allowing LangGraph to continue (or at least return a displayable state):
| Node | Returns on exception |
|---|---|
guide_node |
Keep original conversation_history, needs_more_info=True |
resume_parser_node |
structured_profile={} |
profile_analyzer_node |
personal_profile={} |
career_matcher_node |
career_matches=[] |
reporter_node |
Fixed Markdown failure text |
Example of reporter_node fallback:
1 | |
Comparison: Without isolation, Reporter error → whole graph ainvoke fails → CLI/API 500; with isolation, users at least see failure description or partial sections.
When route_after_guide fails, it returns resume_parser_node — this is a “fail-open advance” at the routing layer, contrasting with guide node’s fail-closed (continue asking for info) — the routing layer fears dead loops more.
6. Strategy 5: run_analysis_pipeline Phased Error Handling
Online report generation mainly goes through run_analysis_pipeline (called by api/routes/chat.py, upload.py, report_gen.py), not through the top-level LangGraph guide loop. Its error handling is “each phase independent try, continue with empty data on failure”:
1 | |
When the Reporter phase fails, it doesn’t return an empty string; instead, it constructs Markdown containing a summary of personal_profile JSON and appends reporter_err at the end for easier OPS log correlation.
When LLM is completely unavailable, the entire LLM chain is skipped, and _generate_fallback_report outputs a rule-engine report with a ⚠️ note:
1 | |
An outer catch still exists: log error, ws_manager.send_error to frontend, then raise — that’s for DB/session-level disasters, not single agent failures.
7. Interaction with Loop Limits (Article 5)
Error handling also includes anti-infinite loops (see Article 5 for details):
agents/guide.pyshould_continue:loop_count >= 8;workflow.pyroute_after_guide:user_msg_count >= 3;recursion_limit: subgraph 15, full workflow 50.
Loop limit exceeded is essentially “forced advancement”, preventing error + retry from forming a logical dead loop in the graph.
8. Error Handling Layer Overview
1 | |
9. Pitfalls and Edge Cases
Misleading name
check_ollama_available
It probes the currentLLM_BASE_URL(could be DeepSeek, OpenAI, Ollama), not only Ollama. After switching to cloud in.env, if Ollama is down but cloud is up, it still caches True/False based on cloud result.Health check default
_ollama_cache["available"] = True
On process startup, before first probe, the first pipeline assumes available; if actually unavailable, it waits for the first POST failure to cache False. For high-availability scenarios, consider warm-up probing at startup.Node isolation “empty dict continue” yields thin reports
Whenprofile_analyzerfails,personal_profilehas many empty fields, but Reporter still runs — users see “a report but content is thin”, better than 500, but should be distinguished viaworkflow_messagesor progress prompts in frontend.run_guide_chatexception has separate fallback
Returns fixed message “Sorry, something went wrong. Could you say that again?”, withis_info_sufficient=False, won’t accidentally triggerrun_analysis_pipeline.Reporter chapter generation uses
get_chat_model()
Different fromget_light_model(); do not assume Reporter has switched to mini model based on old comments (see Article 8 call table).
On error-handling path, Reporter may still be the slowest and most timeout-prone; rule-engine degradation only covers “entire LLM unavailable”, not “only Reporter timeout”.
10. Summary
- Before invocation:
llm/providers.pycached health check,workflow.pyrule-engine report when unavailable. - During invocation: 60s timeout +
llm/parsers.pymulti-strategy JSON extraction. - After invocation: five workflow nodes isolated individually;
run_analysis_pipelinephased catch, Reporter failure still produces summary version. - Goal is not “never fail”, but failure perceptible, degradable, not crashing the whole graph.
- Next article (Article 8) expands on
get_chat_model/get_light_modeland unified LLM calling interface.
Appendix: Key Source Code (Line-by-Line Annotations)
The following code is excerpted from the iCan implementation. Each line has a Chinese comment above so you can follow along without the repository.
Generation command: python3 bin/build-ican-annotated-snippets.py
guide_node exception return
1 | |
Ollama unavailable → rule-based report
1 | |
Pipeline phased try/except
1 | |
Series Navigation
| Article | Topic |
|---|---|
| 1 | System Overview |
| 2 | Five-Agent Collaboration |
| 3 | RIASEC Holland Codes |
| 4–7 | State · Routing · Nesting · 7 Error Handling (This Article) |
| 8–11 | LLM Layer · SSE/WS · DB Migration · PDF |
| 12–14 | JSON Prompt · RIASEC Prompt · Guide Prompt |
| 15–17 | Docker · Middleware · Configuration |