0. Series Loop (Readalong Without Open Source)
End-to-End Pipeline: Vue frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (Parsing → Analysis → Matching → Report) → tools/pdf_exporter PDF.
This Article: 8/17 · LLM Layer · Unified Invocation
| Phase | User Visible | Code Entry | Article |
|---|---|---|---|
| Create Session | Welcome message | POST /api/sessions | 09 |
| Multi-turn Dialogue | SSE streaming | chat/stream → run_guide_single_turn | 06, 14 |
| Info Sufficient | Start Analysis | _run_analysis_background | 05, 07 |
| Resume Parsing | Progress 30% | run_resume_parser | 12 |
| Profile/RIASEC | Progress 50% | run_profile_analyzer | 03, 13 |
| Career Matching | Progress 70% | run_career_matcher | 02 |
| Report | Progress 90% | run_reporter | 11 |
| Download PDF | File | GET …/report/pdf | 11, 15 |
| Description | |
|---|---|
| Before Reading This | Part 07 fault tolerance background |
| After Reading This | Use the call table to decide whether to use chat or light model |
| Next Loop | Part 12: JSON parsing degradation (Part 9) |
Full series loop index: SERIES-LOOP.md
1. What Problem to Solve
iCan has 5 Agent nodes + chat API. If each file does its own ChatOpenAI(...):
- Changing models (OpenAI → DeepSeek → Ollama) requires edits in multiple places;
- When JSON output fails, each Agent handles try/except differently, inconsistent behavior;
- Qwen3 on Ollama may output `` blocks, polluting dialogue and JSON parsing.
Therefore, we consolidate model creation, timeout, JSON degradation, Qwen3 handling into llm/providers.py. Agents only care about messages and return values.
2. Implementation Location and Configuration Source
| Module | Responsibility |
|---|---|
config.py |
Settings.LLM_* defaults & environment variables |
llm/providers.py |
get_chat_model / get_light_model / invoke_llm / invoke_llm_with_json / check_ollama_available |
llm/parsers.py |
parse_json_from_text (JSON degradation final step) |
Defaults (in code, not .env): LLM_MODEL_CHAT=gpt-4o, LLM_MODEL_LIGHT=gpt-4o-mini, LLM_BASE_URL=https://api.openai.com/v1.
Common overrides for local or Docker:
1 | |
3. Dual Models: Who Uses Chat, Who Uses Light
The comment in providers.py says light is used for “resume parsing, report formatting,” but refer to the grep of call sites:
| Caller | Model Factory | Description |
|---|---|---|
agents/guide.py (6 places) |
get_chat_model() |
Multi-turn guidance needs stable tone |
agents/resume_parser.py |
get_light_model() |
Structured JSON extraction |
agents/profile_analyzer.py (6 nodes) |
get_chat_model() |
RIASEC + multi-dimension analysis |
agents/career_matcher.py (3 nodes) |
get_chat_model() |
Path recommendation + JSON |
agents/reporter.py |
get_chat_model() |
Section generation uses chat; file imports light but current section nodes don’t use it |
api/routes/chat.py |
get_chat_model() |
Single-turn completion/streaming |
Conclusion: Only resume_parser consistently uses light; don’t write “Reporter uses mini” in docs unless code changes.
Factory implementation (excerpt llm/providers.py):
1 | |
get_light_model() differs only in LLM_MODEL_LIGHT, otherwise identical.
4. invoke_llm: Timeout and Qwen3
1 | |
Two points:
- Hard timeout 60s (
wait_for), separate fromChatOpenAI(request_timeout=90)– two layers of timeout, the 60s one fires first. _inject_no_think: WhenLLM_BASE_URLcontains11434and model name includesqwen3, prepend/no_thinkbefore system message to avoid thought chain pollution.
Guide’s five inner nodes and Reporter section generation all go through this path.
5. invoke_llm_with_json: Two-Phase Parsing
ResumeParser, ProfileAnalyzer, CareerMatcher need dict; entry point is invoke_llm_with_json:
1 | |
Order: JSON mode → plain text + regex/bracket extraction. Part 12 will detail parse_json_from_text‘s four-layer degradation.
6. Degradation When LLM Unavailable (Workflow Layer)
workflow.py‘s run_analysis_pipeline calls check_ollama_available() before running the four-stage analysis:
- Cache probe result every 30s;
- Send 5-token health check to
LLM_BASE_URL/chat/completions; - If unavailable: skip LLM, use
_regex_quick_profile+_generate_fallback_reportto produce rule-based report, and writeSession.workflow_data(ollama_unavailable: true).
This is business-layer degradation, not internal logic of providers.py. Should be documented separately to avoid readers thinking invoke_llm auto-fallbacks.
7. Pitfalls
① Comment vs. Call Site Inconsistencyget_light_model docstring says “report formatting,” but reporter.py‘s generate_profile_section uses get_chat_model(). Either fix docs or fix code – don’t have contradictions.
② Default Model Is Not DeepSeek
Article title can mention DeepSeek deployment, but body must clarify: code defaults to OpenAI compatibility + gpt-4o, DeepSeek relies on .env.
③ JSON Mode Not Supported by All Backends
Ollama / some domestic APIs have incomplete support for response_format; bind failure will warn and fallback to text mode – in production, test the full ResumeParser chain on the target API.
④ Timeout and Retry
Reporter section generation has 2 attempts (see agents/reporter.py), but invoke_llm itself has no auto-retry; Guide nodes except returns fixed text, does not re-call the model.
8. Summary
- LLM layer centralized in
llm/providers.py, config fromconfig.Settings. - Chat: Guide / Analyzer / Matcher / Reporter / Chat API; Light: currently mainly ResumeParser.
- Unified
invoke_llm(60s + Qwen3/no_think) andinvoke_llm_with_json(JSON mode → parsers). - Health check and rule report degradation in
workflow.run_analysis_pipeline, separated from providers. - Documentation should be based on grep of call sites, not file header comments.
Next article: SSE streaming and WebSocket progress (
api/routes/chat.py,api/ws_manager.py).
Appendix: Key Source Code (Line-by-Line Annotations)
The following code is excerpted from the iCan implementation, with Chinese annotations above each line, readable even without the public repository.
Generation command: python3 bin/build-ican-annotated-snippets.py
get_chat_model
1 | |
invoke_llm
1 | |
invoke_llm_with_json
1 | |
Series Navigation
| Article | Topic |
|---|---|
| 1 | System Overview |
| 2 | Five-Agent Collaboration |
| 3 | Holland RIASEC |
| 4–7 | State · Routing · Nesting · Fault Tolerance |
| 8–11 | LLM Layer · SSE/WS · DB Migration · PDF |
| 12–14 | JSON Prompt · RIASEC Prompt · Guide Prompt |
| 15–17 | Docker · Middleware · Config |