Building an AI Agent System from Scratch: FastAPI + LangGraph Smart Career Planning Platform

0. Series Loop (Read Along Without Public Source Code)

End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (Parsing → Analysis → Matching → Report) → tools/pdf_exporter PDF.
This Post: Post 1/17 · Overview · 17-Post Map

Phase	User Sees	Code Entry	Corresponding Post
Create Session	Welcome Message	POST /api/sessions	09
Multi-turn Chat	SSE Streaming	chat/stream → run_guide_single_turn	06, 14
Info Sufficient	Start Analysis	_run_analysis_background	05, 07
Resume Parsing	Progress 30%	run_resume_parser	12
Profile/RIASEC	Progress 50%	run_profile_analyzer	03, 13
Career Matching	Progress 70%	run_career_matcher	02
Report	Progress 90%	run_reporter	11
Download PDF	File	GET …/report/pdf	11, 15

	Description
Before reading this post	None (recommended to start here)
After reading this post	Able to describe FastAPI interface layer, LangGraph five nodes, two execution paths (HTTP vs CLI)
Next step	Post 02: How the five node functions are chained (Post 2)

Series full loop index: SERIES-LOOP.md

1. What Problem Does It Solve

Common career planning products fall into two extremes:

Pure Questionnaire: Holland 120 questions, get a code after filling in, lacking context about your resume and concerns.
Pure Chat: One big Prompt handles everything, packing parsing, assessment, matching, and report writing into a single call, leading to unstable output.

iCan’s approach is: Multi-turn dialogue to collect context → Structured profile → Agent-based analysis and matching → Generate downloadable PDF report.
The backend uses FastAPI (main.py mounts routes like api/routes/chat.py) to provide API and SSE/WebSocket, and uses LangGraph StateGraph (create_workflow() in workflow.py) to orchestrate the 5-stage pipeline.

Below describes the architecture and technology choices; key logic is accompanied by code snippets for better understanding.

2. Technology Choices (Why Not “One Super Prompt”)

Layer	Choice	Reason in This Project
Web	FastAPI	LLM calls are IO-intensive; async routes + SSE streaming provide better response experience
Orchestration	LangGraph	Guide needs cyclic questioning, the next four stages need linear execution; graph structure is easier to express than LCEL chains
LLM	ChatOpenAI Compatible API	Same `invoke_llm`; switching to DeepSeek / Ollama only changes `base_url` and `model` in `.env`
Report	ReportLab + matplotlib	Controllable Chinese PDF, radar/bar charts, no browser printing required
Frontend	Vue 3 + Vite	Chat page + report generation progress + PDF download

Why not CrewAI / AutoGen: The current flow is a fixed DAG (Guide conditional loop + four sequential stages). LangGraph’s add_conditional_edges is sufficient, and the state TypedDict is clear.

3. System Architecture

3.1 Overview (draw.io)

iCan system architecture: FastAPI interface layer, LangGraph five Agent pipeline, LLM/DB/PDF tool layer

3.2 Data Flow (Mermaid)

flowchart TB
  U[User / Vue Frontend] --> API[FastAPI routes]
  API --> G[guide_node]
  G --> R{route_after_guide}
  R -->|Insufficient Info| G
  R -->|Sufficient Info| P[resume_parser_node]
  P --> A[profile_analyzer_node]
  A --> M[career_matcher_node]
  M --> T[reporter_node]
  T --> PDF[ReportLab PDF]
  API -.SSE/WS.-> U

Top-level state type: iCanWorkflowState (core/state.py). Guide has its own inner GuideState and subgraph create_guide_graph(), detailed in Post 6.

4. Top-Level Workflow Code (Core 30 Lines)

Implementation location: workflow.py → create_workflow().

graph = StateGraph(iCanWorkflowState)
graph.add_node("guide_node", guide_node)
graph.add_node("resume_parser_node", resume_parser_node)
graph.add_node("profile_analyzer_node", profile_analyzer_node)
graph.add_node("career_matcher_node", career_matcher_node)
graph.add_node("reporter_node", reporter_node)

graph.set_entry_point("guide_node")
graph.add_conditional_edges(
    "guide_node",
    route_after_guide,
    {"guide_node": "guide_node", "resume_parser_node": "resume_parser_node"},
)
graph.add_edge("resume_parser_node", "profile_analyzer_node")
graph.add_edge("profile_analyzer_node", "career_matcher_node")
graph.add_edge("career_matcher_node", "reporter_node")
graph.add_edge("reporter_node", END)
return graph.compile()

route_after_guide reads fields like needs_more_info returned by Guide to decide whether to continue the conversation or proceed to the parsing pipeline (with a loop limit, see Post 07).

5. Five Agent Responsibilities

Node	Module	Input → Output	Description
`guide_node`	`agents/guide.py`	Dialogue history → `collected_info`, sufficiency flag	Inner 5 steps: welcome → … → check_sufficiency
`resume_parser_node`	`agents/resume_parser.py`	Raw text → Structured JSON profile	Uses `get_light_model()` + JSON fallback parsing
`profile_analyzer_node`	`agents/profile_analyzer.py`	Profile → Abilities/Personality/Values + RIASEC	Multi-node subgraph, Post 3 covers Holland scoring
`career_matcher_node`	`agents/career_matcher.py`	Profile → Three-level path recommendations	Vertical / Horizontal / Transition three tiers
`reporter_node`	`agents/reporter.py`	Analysis results → Markdown report	Then goes through `tools/pdf_exporter.py` to produce PDF

Note: In implementation, these are async functions + sub-StateGraphs, not five Python classes.

6. LLM Integration (Don’t Hardcode “Only DeepSeek”)

llm/providers.py provides a unified wrapper:

get_chat_model() / get_light_model() → ChatOpenAI(api_key, base_url, model=...)
invoke_llm / invoke_llm_with_json → Normal text and JSON mode

Configuration from environment variables (config.py + pydantic-settings):

Scenario	Typical Config
Local dev `.env`	`LLM_MODEL_CHAT=deepseek-v4-flash`, `LLM_BASE_URL=https://api.deepseek.com/v1`
Docker + Ollama	`qwen3.5:9b` + `http://host.docker.internal:11434/v1` (Qwen3 requires `/no_think` injection)
Code defaults	Falls back to `gpt-4o` when env not set (deployment must explicitly write `.env`)

Post 8 covers dual-model strategy; currently ResumeParser uses light model, Reporter still uses chat model (inconsistent with some earlier docs; code is the authority).

7. Deployment Highlights (Docker)

Multi-stage Dockerfile: Node builds frontend static assets → Python image installs dependencies → uvicorn ican.main:app.

Pitfalls are mostly in Post 15; here are three key points:

Install CPU PyTorch first, to avoid sentence-transformers pulling in a huge CUDA package.
Install fonts-noto-cjk in the image, otherwise PDF Chinese characters appear as boxes.
COPY dist separately from frontend; do not include host node_modules into the image.

8. Directory Structure (Easy to Navigate by Post)

ican/
├── main.py              # FastAPI entry, lifespan, route mounting
├── workflow.py          # Top-level LangGraph (focus of this post)
├── config.py            # pydantic-settings
├── agents/              # Five Agent subgraphs
├── llm/                 # providers / parsers / prompts
├── api/routes/          # chat, report, upload, ws
├── tools/               # pdf_exporter, doc_reader
└── db/                  # SQLAlchemy + auto-migration (Post 10)
frontend/                # Vue 3
Dockerfile / docker-compose.yml

9. Pitfall Records (Compare with Source Code)

Default model in code is not DeepSeek
In config.py, the default LLM_MODEL_CHAT is gpt-4o, and LLM_BASE_URL defaults to OpenAI’s official URL. Using deepseek-v4-flash locally is a .env deployment configuration, not hardcoded; missing the env var will cause connection to the wrong endpoint.
Online conversations and create_workflow() are not the same path
SSE multi-turn chat goes through run_guide_chat() in workflow.py (internally calling run_guide_single_turn in agents/guide.py); once information is sufficient, api/routes/chat.py triggers run_analysis_pipeline(), which directly chains resume_parser → profile_analyzer → career_matcher → reporter, rather than calling ainvoke on the full top-level graph each time. Full run_workflow() + create_workflow() is only used for CLI or one-shot runs.
PlannerState is defined but not integrated
PlannerState exists in core/state.py, but the top-level workflow.py has no Planner node; action planning content is currently merged into the report section of agents/reporter.py. Do not write “six Agents” in articles.
Top-level state uses workflow_messages for accumulation, not messages
The Reducer field in iCanWorkflowState is workflow_messages: Annotated[list[str], operator.add]; messages only appears in the inner GuideState. Mixing them up leads to wrong log fields during debugging.
Dual-model division is code-authoritative
Comments in llm/providers.py once described Reporter could use a mini model, but agents/reporter.py currently still calls get_chat_model() for chapter generation; only agents/resume_parser.py definitely uses get_light_model().
Fallback path when Ollama is unavailable
run_analysis_pipeline() first calls check_ollama_available(); on failure, it falls back to _regex_quick_profile + _generate_fallback_report in workflow.py. Report quality drops significantly — LLM availability must be ensured at the ops level.

10. Series Guide

Post	Topic
1	System Overview (This post)
2	Five Agent Collaboration and `iCanWorkflowState`
3	Holland RIASEC + OpenAI Compatible API Deployment Example
4–7	LangGraph State, Routing, Nesting, Fault Tolerance
8–11	FastAPI Integration, SSE/WS, DB, PDF
12–14	Prompt and Stable JSON Output
15–17	Docker, Middleware, Configuration Management

11. Summary

iCan’s core is not “switching to a larger model”, but rather using LangGraph to split uncertain LLM steps into testable nodes, using FastAPI to handle streaming interactions, and ReportLab to deliver persistent PDFs.

Next post: LangGraph Multi-Agent Orchestration — How the five subgraphs connect, how route_after_guide works with the Guide inner loop.

← Back to iCan Topic