FastAPI + LangChain Integration Best Practices: Unified LLM Call Interface Design

0. Series Loop (Readalong Without Open Source)

End-to-End Pipeline: Vue frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (Parsing → Analysis → Matching → Report) → tools/pdf_exporter PDF.
This Article: 8/17 · LLM Layer · Unified Invocation

Phase	User Visible	Code Entry	Article
Create Session	Welcome message	POST /api/sessions	09
Multi-turn Dialogue	SSE streaming	chat/stream → run_guide_single_turn	06, 14
Info Sufficient	Start Analysis	_run_analysis_background	05, 07
Resume Parsing	Progress 30%	run_resume_parser	12
Profile/RIASEC	Progress 50%	run_profile_analyzer	03, 13
Career Matching	Progress 70%	run_career_matcher	02
Report	Progress 90%	run_reporter	11
Download PDF	File	GET …/report/pdf	11, 15

	Description
Before Reading This	Part 07 fault tolerance background
After Reading This	Use the call table to decide whether to use chat or light model
Next Loop	Part 12: JSON parsing degradation (Part 9)

Full series loop index: SERIES-LOOP.md

1. What Problem to Solve

iCan has 5 Agent nodes + chat API. If each file does its own ChatOpenAI(...):

Changing models (OpenAI → DeepSeek → Ollama) requires edits in multiple places;
When JSON output fails, each Agent handles try/except differently, inconsistent behavior;
Qwen3 on Ollama may output `` blocks, polluting dialogue and JSON parsing.

Therefore, we consolidate model creation, timeout, JSON degradation, Qwen3 handling into llm/providers.py. Agents only care about messages and return values.

2. Implementation Location and Configuration Source

Module	Responsibility
`config.py`	`Settings.LLM_*` defaults & environment variables
`llm/providers.py`	`get_chat_model` / `get_light_model` / `invoke_llm` / `invoke_llm_with_json` / `check_ollama_available`
`llm/parsers.py`	`parse_json_from_text` (JSON degradation final step)

Defaults (in code, not .env): LLM_MODEL_CHAT=gpt-4o, LLM_MODEL_LIGHT=gpt-4o-mini, LLM_BASE_URL=https://api.openai.com/v1.
Common overrides for local or Docker:

1
2
3

LLM_BASE_URL=https://api.deepseek.com/v1
LLM_MODEL_CHAT=deepseek-v4-flash
LLM_MODEL_LIGHT=deepseek-v4-flash

LLM Unified Invocation Layer

3. Dual Models: Who Uses Chat, Who Uses Light

The comment in providers.py says light is used for “resume parsing, report formatting,” but refer to the grep of call sites:

Caller	Model Factory	Description
`agents/guide.py` (6 places)	`get_chat_model()`	Multi-turn guidance needs stable tone
`agents/resume_parser.py`	`get_light_model()`	Structured JSON extraction
`agents/profile_analyzer.py` (6 nodes)	`get_chat_model()`	RIASEC + multi-dimension analysis
`agents/career_matcher.py` (3 nodes)	`get_chat_model()`	Path recommendation + JSON
`agents/reporter.py`	`get_chat_model()`	Section generation uses chat; file imports light but current section nodes don’t use it
`api/routes/chat.py`	`get_chat_model()`	Single-turn completion/streaming

Conclusion: Only resume_parser consistently uses light; don’t write “Reporter uses mini” in docs unless code changes.

Factory implementation (excerpt llm/providers.py):

def get_chat_model() -> ChatOpenAI:
    return ChatOpenAI(
        api_key=settings.LLM_API_KEY,
        base_url=settings.LLM_BASE_URL,
        model=settings.LLM_MODEL_CHAT,
        temperature=settings.LLM_TEMPERATURE,
        max_tokens=settings.LLM_MAX_TOKENS,
        request_timeout=90,
    )

get_light_model() differs only in LLM_MODEL_LIGHT, otherwise identical.

4. invoke_llm: Timeout and Qwen3

async def invoke_llm(model: ChatOpenAI, messages: list, **kwargs) -> str:
    processed = _inject_no_think(messages)
    response = await asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
    return response.content

Two points:

Hard timeout 60s (wait_for), separate from ChatOpenAI(request_timeout=90) – two layers of timeout, the 60s one fires first.
_inject_no_think: When LLM_BASE_URL contains 11434 and model name includes qwen3, prepend /no_think before system message to avoid thought chain pollution.

Guide’s five inner nodes and Reporter section generation all go through this path.

5. invoke_llm_with_json: Two-Phase Parsing

ResumeParser, ProfileAnalyzer, CareerMatcher need dict; entry point is invoke_llm_with_json:

async def invoke_llm_with_json(model, messages, **kwargs) -> dict:
    processed = _inject_no_think(messages)
    try:
        json_model = model.bind(response_format={"type": "json_object"})
        response = await asyncio.wait_for(json_model.ainvoke(processed, **kwargs), timeout=60)
        return json.loads(response.content)
    except Exception:
        response = await asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
        return parse_json_from_text(response.content)  # llm/parsers.py

Order: JSON mode → plain text + regex/bracket extraction. Part 12 will detail parse_json_from_text‘s four-layer degradation.

6. Degradation When LLM Unavailable (Workflow Layer)

workflow.py‘s run_analysis_pipeline calls check_ollama_available() before running the four-stage analysis:

Cache probe result every 30s;
Send 5-token health check to LLM_BASE_URL/chat/completions;
If unavailable: skip LLM, use _regex_quick_profile + _generate_fallback_report to produce rule-based report, and write Session.workflow_data (ollama_unavailable: true).

This is business-layer degradation, not internal logic of providers.py. Should be documented separately to avoid readers thinking invoke_llm auto-fallbacks.

7. Pitfalls

① Comment vs. Call Site Inconsistency
get_light_model docstring says “report formatting,” but reporter.py‘s generate_profile_section uses get_chat_model(). Either fix docs or fix code – don’t have contradictions.

② Default Model Is Not DeepSeek
Article title can mention DeepSeek deployment, but body must clarify: code defaults to OpenAI compatibility + gpt-4o, DeepSeek relies on .env.

③ JSON Mode Not Supported by All Backends
Ollama / some domestic APIs have incomplete support for response_format; bind failure will warn and fallback to text mode – in production, test the full ResumeParser chain on the target API.

④ Timeout and Retry
Reporter section generation has 2 attempts (see agents/reporter.py), but invoke_llm itself has no auto-retry; Guide nodes except returns fixed text, does not re-call the model.

8. Summary

LLM layer centralized in llm/providers.py, config from config.Settings.
Chat: Guide / Analyzer / Matcher / Reporter / Chat API; Light: currently mainly ResumeParser.
Unified invoke_llm (60s + Qwen3 /no_think) and invoke_llm_with_json (JSON mode → parsers).
Health check and rule report degradation in workflow.run_analysis_pipeline, separated from providers.
Documentation should be based on grep of call sites, not file header comments.

Next article: SSE streaming and WebSocket progress (api/routes/chat.py, api/ws_manager.py).

Appendix: Key Source Code (Line-by-Line Annotations)

The following code is excerpted from the iCan implementation, with Chinese annotations above each line, readable even without the public repository.
Generation command: python3 bin/build-ican-annotated-snippets.py

get_chat_model

# ========== get_chat_model ==========
# Source file: llm/providers.py   Lines 85-121

# L85: Synchronous function get_chat_model: used for routing decisions or factory method
def get_chat_model() -> ChatOpenAI:
# L87: [Doc] Get the chat model (GPT-4o).
# L89: [Doc] Function description:
# L90: [Doc] Create and return a ChatOpenAI instance based on GPT-4o,
# L91: [Doc] used for core business scenarios such as conversation guidance, personal analysis,
# L92: [Doc] career matching, and planning generation.
# L93: [Doc] Model parameters are read from the global settings configuration.
# L95: [Doc] Input description:
# L96: [Doc] None (reads LLM_API_KEY, LLM_BASE_URL, LLM_MODEL_CHAT,
# L97: [Doc] LLM_TEMPERATURE, LLM_MAX_TOKENS from config)
# L99: [Doc] Output description:
# L100: [Doc] ChatOpenAI: configured chat model instance
# (Lines 86-100 are function/module docstring, converted to comments for readability)
# L101: Begin try block, except handles fallback
    try:
# L102: Log for online troubleshooting of node input/output
        logger.info(
# L103: Execute statement (details in business description above)
            f"[get_chat_model] Starting, input: none, "
# L104: Execute statement (details in business description above)
            f"using model: {settings.LLM_MODEL_CHAT}, "
# L105: Execute statement (details in business description above)
            f"BASE_URL: {settings.LLM_BASE_URL}"
# L106: Execute statement (details in business description above)
        )

# L108: LangChain OpenAI compatible client; change DeepSeek/Ollama by changing base_url/model only
        model = ChatOpenAI(
# L109: Assignment: update local variable or state field
            api_key=settings.LLM_API_KEY,
# L110: Assignment: update local variable or state field
            base_url=settings.LLM_BASE_URL,
# L111: Assignment: update local variable or state field
            model=settings.LLM_MODEL_CHAT,
# L112: Assignment: update local variable or state field
            temperature=settings.LLM_TEMPERATURE,
# L113: Assignment: update local variable or state field
            max_tokens=settings.LLM_MAX_TOKENS,
# L114: Assignment: update local variable or state field
            request_timeout=90,
# L115: Execute statement (details in business description above)
        )

# L117: Log for online troubleshooting of node input/output
        logger.info(
# L118: Execute statement (details in business description above)
            f"[get_chat_model] Completed, returning: ChatOpenAI instance, "
# L119: Execute statement (details in business description above)
            f"model: {settings.LLM_MODEL_CHAT}"
# L120: Execute statement (details in business description above)
        )
# L121: Return field to merge into state (LangGraph will merge)
        return model

invoke_llm

# ========== invoke_llm ==========
# Source file: llm/providers.py   Lines 171-205

# L171: Async function invoke_llm: can be awaited, suitable for IO-bound LLM/DB calls
async def invoke_llm(model: ChatOpenAI, messages: list, **kwargs) -> str:
# L172: Begin try block, except handles fallback
    try:
# L173: Log for online troubleshooting of node input/output
        logger.info(
# L174: Assignment: update local variable or state field
            f"[invoke_llm] Starting, input: model={model.model_name}, "
# L175: Execute statement (details in business description above)
            f"messages count: {len(messages)}, kwargs: {kwargs}"
# L176: Execute statement (details in business description above)
        )
# L177: Log for online troubleshooting of node input/output
        logger.debug(f"[invoke_llm] Message details: {messages}")

# L179: Assignment: update local variable or state field
        processed = _inject_no_think(messages)

# L181: Import dependency module
        import asyncio
# L182: Begin try block, except handles fallback
        try:
# L183: Hard timeout wrapper to prevent LLM hanging
            response = await asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
# L184: Catch exception to avoid crashing the whole graph/request
        except asyncio.TimeoutError:
# L185: Log for online troubleshooting of node input/output
            logger.error("[invoke_llm] LLM call timed out (60s)")
# L186: Re-raise exception, handled by caller or LangGraph
            raise TimeoutError("AI model response timed out, please retry later")

# L188: Assignment: update local variable or state field
        result = response.content

# L190: Log for online troubleshooting of node input/output
        logger.info(
# L191: Execute statement (details in business description above)
            f"[invoke_llm] Completed, returned text length: {len(result) if result else 0}"
# L192: Execute statement (details in business description above)
        )
# L193: Log for online troubleshooting of node input/output
        logger.debug(f"[invoke_llm] Return content preview: {result[:200] if result else 'empty'}")

# L195: Return field to merge into state (LangGraph will merge)
        return result

# L197: Catch exception to avoid crashing the whole graph/request
    except TimeoutError:
# L198: Re-raise exception, handled by caller or LangGraph
        raise
# L199: Catch exception to avoid crashing the whole graph/request
    except Exception as e:
# L200: Log for online troubleshooting of node input/output
        logger.error(
# L201: Assignment: update local variable or state field
            f"[invoke_llm] Exception calling LLM, model={getattr(model, 'model_name', 'unknown')}, "
# L202: Execute statement (details in business description above)
            f"exception: {e}",
# L203: Assignment: update local variable or state field
            exc_info=True
# L204: Execute statement (details in business description above)
        )
# L205: Re-raise exception, handled by caller or LangGraph
        raise

invoke_llm_with_json

# ========== invoke_llm_with_json ==========
# Source file: llm/providers.py   Lines 208-278

# L208: Async function invoke_llm_with_json: can be awaited, suitable for IO-bound LLM/DB calls
async def invoke_llm_with_json(model: ChatOpenAI, messages: list, **kwargs) -> dict:
# L210: [Doc] Call LLM and parse JSON output.
# L212: [Doc] Function description:
# L213: [Doc] Use the specified ChatOpenAI model instance, pass message list to call LLM asynchronously,
# L214: [Doc] request model to reply in JSON format, and automatically parse reply content to Python dict.
# L215: [Doc] Suitable for scenarios requiring structured data output, such as resume parsing, career matching results.
# L216: [Doc] Prefer response_format JSON mode, fallback to text parsing if unsupported.
# L218: [Doc] Input description:
# L219: [Doc] model (ChatOpenAI): configured ChatOpenAI model instance
# L220: [Doc] messages (list): message list, format [{"role": "system/user/assistant", "content": "..."}]
# L221: [Doc] **kwargs: extra parameters, e.g., temperature, max_tokens override defaults
# L223: [Doc] Output description:
# L224: [Doc] dict: parsed JSON dictionary data
# (Lines 209-225 are function/module docstring, converted to comments for readability)
# L226: Import dependency module
    import json

# L228: Import dependency module
    from ican.llm.parsers import parse_json_from_text

# L230: Begin try block, except handles fallback
    try:
# L231: Log for online troubleshooting of node input/output
        logger.info(
# L232: Call LLM and parse JSON; internally has JSON mode → text degradation chain
            f"[invoke_llm_with_json] Starting, input: model={model.model_name}, "
# L233: Execute statement (details in business description above)
            f"messages count: {len(messages)}, kwargs: {kwargs}"
# L234: Execute statement (details in business description above)
        )
# L235: Call LLM and parse JSON; internally has JSON mode → text degradation chain
        logger.debug(f"[invoke_llm_with_json] Message details: {messages}")

# L237: Assignment: update local variable or state field
        processed = _inject_no_think(messages)
# L238: Assignment: update local variable or state field
        raw_content = None

# L240: Import dependency module
        import asyncio as _asyncio

# L242: Begin try block, except handles fallback
        try:
# L243: Attempt OpenAI JSON mode, fallback to except if unsupported
            json_model = model.bind(response_format={"type": "json_object"})
# L244: Begin try block, except handles fallback
            try:
# L245: Hard timeout wrapper to prevent LLM hanging
                response = await _asyncio.wait_for(json_model.ainvoke(processed, **kwargs), timeout=60)
# L246: Catch exception to avoid crashing the whole graph/request
            except _asyncio.TimeoutError:
# L247: Re-raise exception, handled by caller or LangGraph
                raise TimeoutError("AI model response timed out, please retry later")
# L248: Assignment: update local variable or state field
            raw_content = response.content
# L249: Catch exception to avoid crashing the whole graph/request
        except TimeoutError:
# L250: Re-raise exception, handled by caller or LangGraph
            raise
# L251: Catch exception to avoid crashing the whole graph/request
        except Exception as bind_err:
# L252: Log for online troubleshooting of node input/output
            logger.warning(
# L253: Call LLM and parse JSON; internally has JSON mode → text degradation chain
                f"[invoke_llm_with_json] response_format JSON mode not supported, falling back to text mode: {bind_err}"
# L254: Execute statement (details in business description above)
            )
# L255: Begin try block, except handles fallback
            try:
# L256: Hard timeout wrapper to prevent LLM hanging
                response = await _asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
# L257: Catch exception to avoid crashing the whole graph/request
            except _asyncio.TimeoutError:
# L258: Re-raise exception, handled by caller or LangGraph
                raise TimeoutError("AI model response timed out, please retry later")
# L259: Assignment: update local variable or state field
            raw_content = response.content

# L261: Log for online troubleshooting of node input/output
        logger.debug(
# L262: Call LLM and parse JSON; internally has JSON mode → text degradation chain
            f"[invoke_llm_with_json] Raw reply length: {len(raw_content) if raw_content else 0}"
# L263: Execute statement (details in business description above)
        )

# L265: Begin try block, except handles fallback
        try:
# L266: Parse LLM response string to Python dict
            result = json.loads(raw_content)
# L267: Catch exception to avoid crashing the whole graph/request
        except (json.JSONDecodeError, TypeError):
# L268: Call LLM and parse JSON; internally has JSON mode → text degradation chain
            logger.info("[invoke_llm_with_json] Direct JSON parsing failed, trying parse_json_from_text extraction")
# L269: Extract JSON from LLM text (four layers of regex/parsing strategy)
            result = parse_json_from_text(raw_content)
# L270: Conditional branch
            if not result:
# L271: Re-raise exception, handled by caller or LangGraph
                raise ValueError(f"Cannot extract valid JSON from LLM reply, raw content: {raw_content[:300]}")

# L273: Log for online troubleshooting of node input/output
        logger.info(
# L274: Call LLM and parse JSON; internally has JSON mode → text degradation chain
            f"[invoke_llm_with_json] Completed, returned JSON field count: {len(result)}"
# L275: Execute statement (details in business description above)
        )
# L276: Call LLM and parse JSON; internally has JSON mode → text degradation chain
        logger.debug(f"[invoke_llm_with_json] Return JSON preview: {str(result)[:300]}")

# L278: Return field to merge into state (LangGraph will merge)
        return result

Article	Topic
1	System Overview
2	Five-Agent Collaboration
3	Holland RIASEC
4–7	State · Routing · Nesting · Fault Tolerance
8–11	LLM Layer · SSE/WS · DB Migration · PDF
12–14	JSON Prompt · RIASEC Prompt · Guide Prompt
15–17	Docker · Middleware · Config

← Back to iCan Topic