0. Series Loop (Read Along Without Public Source Code)

End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (Parsing → Analysis → Matching → Report) → tools/pdf_exporter PDF.
This Post: Post 1/17 · Overview · 17-Post Map

Phase User Sees Code Entry Corresponding Post
Create Session Welcome Message POST /api/sessions 09
Multi-turn Chat SSE Streaming chat/stream → run_guide_single_turn 06, 14
Info Sufficient Start Analysis _run_analysis_background 05, 07
Resume Parsing Progress 30% run_resume_parser 12
Profile/RIASEC Progress 50% run_profile_analyzer 03, 13
Career Matching Progress 70% run_career_matcher 02
Report Progress 90% run_reporter 11
Download PDF File GET …/report/pdf 11, 15
Description
Before reading this post None (recommended to start here)
After reading this post Able to describe FastAPI interface layer, LangGraph five nodes, two execution paths (HTTP vs CLI)
Next step Post 02: How the five node functions are chained (Post 2)

Series full loop index: SERIES-LOOP.md

1. What Problem Does It Solve

Common career planning products fall into two extremes:

  • Pure Questionnaire: Holland 120 questions, get a code after filling in, lacking context about your resume and concerns.
  • Pure Chat: One big Prompt handles everything, packing parsing, assessment, matching, and report writing into a single call, leading to unstable output.

iCan’s approach is: Multi-turn dialogue to collect context → Structured profile → Agent-based analysis and matching → Generate downloadable PDF report.
The backend uses FastAPI (main.py mounts routes like api/routes/chat.py) to provide API and SSE/WebSocket, and uses LangGraph StateGraph (create_workflow() in workflow.py) to orchestrate the 5-stage pipeline.

Below describes the architecture and technology choices; key logic is accompanied by code snippets for better understanding.


2. Technology Choices (Why Not “One Super Prompt”)

Layer Choice Reason in This Project
Web FastAPI LLM calls are IO-intensive; async routes + SSE streaming provide better response experience
Orchestration LangGraph Guide needs cyclic questioning, the next four stages need linear execution; graph structure is easier to express than LCEL chains
LLM ChatOpenAI Compatible API Same invoke_llm; switching to DeepSeek / Ollama only changes base_url and model in .env
Report ReportLab + matplotlib Controllable Chinese PDF, radar/bar charts, no browser printing required
Frontend Vue 3 + Vite Chat page + report generation progress + PDF download

Why not CrewAI / AutoGen: The current flow is a fixed DAG (Guide conditional loop + four sequential stages). LangGraph’s add_conditional_edges is sufficient, and the state TypedDict is clear.


3. System Architecture

3.1 Overview (draw.io)

iCan system architecture: FastAPI interface layer, LangGraph five Agent pipeline, LLM/DB/PDF tool layer

3.2 Data Flow (Mermaid)

1
2
3
4
5
6
7
8
9
10
11
flowchart TB
U[User / Vue Frontend] --> API[FastAPI routes]
API --> G[guide_node]
G --> R{route_after_guide}
R -->|Insufficient Info| G
R -->|Sufficient Info| P[resume_parser_node]
P --> A[profile_analyzer_node]
A --> M[career_matcher_node]
M --> T[reporter_node]
T --> PDF[ReportLab PDF]
API -.SSE/WS.-> U

Top-level state type: iCanWorkflowState (core/state.py). Guide has its own inner GuideState and subgraph create_guide_graph(), detailed in Post 6.


4. Top-Level Workflow Code (Core 30 Lines)

Implementation location: workflow.pycreate_workflow().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
graph = StateGraph(iCanWorkflowState)
graph.add_node("guide_node", guide_node)
graph.add_node("resume_parser_node", resume_parser_node)
graph.add_node("profile_analyzer_node", profile_analyzer_node)
graph.add_node("career_matcher_node", career_matcher_node)
graph.add_node("reporter_node", reporter_node)

graph.set_entry_point("guide_node")
graph.add_conditional_edges(
"guide_node",
route_after_guide,
{"guide_node": "guide_node", "resume_parser_node": "resume_parser_node"},
)
graph.add_edge("resume_parser_node", "profile_analyzer_node")
graph.add_edge("profile_analyzer_node", "career_matcher_node")
graph.add_edge("career_matcher_node", "reporter_node")
graph.add_edge("reporter_node", END)
return graph.compile()

route_after_guide reads fields like needs_more_info returned by Guide to decide whether to continue the conversation or proceed to the parsing pipeline (with a loop limit, see Post 07).


5. Five Agent Responsibilities

Node Module Input → Output Description
guide_node agents/guide.py Dialogue history → collected_info, sufficiency flag Inner 5 steps: welcome → … → check_sufficiency
resume_parser_node agents/resume_parser.py Raw text → Structured JSON profile Uses get_light_model() + JSON fallback parsing
profile_analyzer_node agents/profile_analyzer.py Profile → Abilities/Personality/Values + RIASEC Multi-node subgraph, Post 3 covers Holland scoring
career_matcher_node agents/career_matcher.py Profile → Three-level path recommendations Vertical / Horizontal / Transition three tiers
reporter_node agents/reporter.py Analysis results → Markdown report Then goes through tools/pdf_exporter.py to produce PDF

Note: In implementation, these are async functions + sub-StateGraphs, not five Python classes.


6. LLM Integration (Don’t Hardcode “Only DeepSeek”)

llm/providers.py provides a unified wrapper:

  • get_chat_model() / get_light_model()ChatOpenAI(api_key, base_url, model=...)
  • invoke_llm / invoke_llm_with_json → Normal text and JSON mode

Configuration from environment variables (config.py + pydantic-settings):

Scenario Typical Config
Local dev .env LLM_MODEL_CHAT=deepseek-v4-flash, LLM_BASE_URL=https://api.deepseek.com/v1
Docker + Ollama qwen3.5:9b + http://host.docker.internal:11434/v1 (Qwen3 requires /no_think injection)
Code defaults Falls back to gpt-4o when env not set (deployment must explicitly write .env)

Post 8 covers dual-model strategy; currently ResumeParser uses light model, Reporter still uses chat model (inconsistent with some earlier docs; code is the authority).


7. Deployment Highlights (Docker)

Multi-stage Dockerfile: Node builds frontend static assets → Python image installs dependencies → uvicorn ican.main:app.

Pitfalls are mostly in Post 15; here are three key points:

  1. Install CPU PyTorch first, to avoid sentence-transformers pulling in a huge CUDA package.
  2. Install fonts-noto-cjk in the image, otherwise PDF Chinese characters appear as boxes.
  3. COPY dist separately from frontend; do not include host node_modules into the image.

8. Directory Structure (Easy to Navigate by Post)

1
2
3
4
5
6
7
8
9
10
11
ican/
├── main.py # FastAPI entry, lifespan, route mounting
├── workflow.py # Top-level LangGraph (focus of this post)
├── config.py # pydantic-settings
├── agents/ # Five Agent subgraphs
├── llm/ # providers / parsers / prompts
├── api/routes/ # chat, report, upload, ws
├── tools/ # pdf_exporter, doc_reader
└── db/ # SQLAlchemy + auto-migration (Post 10)
frontend/ # Vue 3
Dockerfile / docker-compose.yml

9. Pitfall Records (Compare with Source Code)

  1. Default model in code is not DeepSeek
    In config.py, the default LLM_MODEL_CHAT is gpt-4o, and LLM_BASE_URL defaults to OpenAI’s official URL. Using deepseek-v4-flash locally is a .env deployment configuration, not hardcoded; missing the env var will cause connection to the wrong endpoint.

  2. Online conversations and create_workflow() are not the same path
    SSE multi-turn chat goes through run_guide_chat() in workflow.py (internally calling run_guide_single_turn in agents/guide.py); once information is sufficient, api/routes/chat.py triggers run_analysis_pipeline(), which directly chains resume_parserprofile_analyzercareer_matcherreporter, rather than calling ainvoke on the full top-level graph each time. Full run_workflow() + create_workflow() is only used for CLI or one-shot runs.

  3. PlannerState is defined but not integrated
    PlannerState exists in core/state.py, but the top-level workflow.py has no Planner node; action planning content is currently merged into the report section of agents/reporter.py. Do not write “six Agents” in articles.

  4. Top-level state uses workflow_messages for accumulation, not messages
    The Reducer field in iCanWorkflowState is workflow_messages: Annotated[list[str], operator.add]; messages only appears in the inner GuideState. Mixing them up leads to wrong log fields during debugging.

  5. Dual-model division is code-authoritative
    Comments in llm/providers.py once described Reporter could use a mini model, but agents/reporter.py currently still calls get_chat_model() for chapter generation; only agents/resume_parser.py definitely uses get_light_model().

  6. Fallback path when Ollama is unavailable
    run_analysis_pipeline() first calls check_ollama_available(); on failure, it falls back to _regex_quick_profile + _generate_fallback_report in workflow.py. Report quality drops significantly — LLM availability must be ensured at the ops level.


10. Series Guide

Post Topic
1 System Overview (This post)
2 Five Agent Collaboration and iCanWorkflowState
3 Holland RIASEC + OpenAI Compatible API Deployment Example
4–7 LangGraph State, Routing, Nesting, Fault Tolerance
8–11 FastAPI Integration, SSE/WS, DB, PDF
12–14 Prompt and Stable JSON Output
15–17 Docker, Middleware, Configuration Management

11. Summary

iCan’s core is not “switching to a larger model”, but rather using LangGraph to split uncertain LLM steps into testable nodes, using FastAPI to handle streaming interactions, and ReportLab to deliver persistent PDFs.

Next post: LangGraph Multi-Agent Orchestration — How the five subgraphs connect, how route_after_guide works with the Guide inner loop.


← Back to iCan Topic