0. Series Loop (Follow along without open-source code)

End-to-end pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pipeline (parse→analyze→match→report) → tools/pdf_exporter PDF.
This article: 11/17 · Delivery Loop · PDF

Phase User visible Code entry Corresponding article
Create session Welcome message POST /api/sessions 09
Multi-turn dialogue SSE streaming chat/stream → run_guide_single_turn 06, 14
Sufficient info Start analysis _run_analysis_background 05, 07
Resume parsing Progress 30% run_resume_parser 12
Profile/RIASEC Progress 50% run_profile_analyzer 03, 13
Career matching Progress 70% run_career_matcher 02
Report Progress 90% run_reporter 11
Download PDF File GET …/report/pdf 11, 15
Description
Before reading Article 10: Report stored in DB
After reading Follow along generate_pdf for Chinese fonts and Markdown parsing
Next loop Article 15: Docker font dependencies (Article 12)

Full series loop index: SERIES-LOOP.md

1. What Problem to Solve

The agents/reporter.py produces a Markdown string (final_report). The frontend can render it directly, but users need a PDF for download. Requirements:

  • Chinese text, tables, heading levels must not be garbled;
  • Ideally include charts for RIASEC / ability dimensions;
  • Avoid an HTML→PDF intermediate layer (WeasyPrint/wkhtmltopdf have heavy dependencies and previously had escaping bugs).

Implementation is concentrated in tools/pdf_exporter.py; the HTTP entry point is GET /api/sessions/{session_id}/report/pdf in api/routes/report.py.


2. Implementation Location and Call Chain

Module Responsibility
agents/reporter.py LangGraph sub‑graph generates Markdown final_report
workflow.py / api/routes/chat.py After analysis completes, write final_report into workflow_data
tools/pdf_exporter.py generate_pdf_build_pdf: fonts, charts, Markdown parsing, ReportLab layout
api/routes/report.py download_report_pdf reads from DB and returns application/pdf

The Reporter chapter nodes use get_chat_model() + invoke_llm to write Markdown (same as article 8, currently not using get_light_model()). The PDF layer no longer calls LLM, it only consumes existing Markdown.

Download route (excerpt from api/routes/report.py):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@router.get("/{session_id}/report/pdf")
async def download_report_pdf(session_id: str):
from ican.tools.pdf_exporter import generate_pdf

workflow_data = session_data.get("workflow_data") or {}
report_md = workflow_data.get("final_report", "")
if not report_md:
raise HTTPException(status_code=404, detail="Report not yet generated")

profile_data = workflow_data.get("personal_profile") or workflow_data.get("structured_profile") or {}
career_matches = workflow_data.get("career_matches") or []

pdf_bytes = await generate_pdf(
report_md,
title="iCan Career Planning Report",
profile_data=profile_data,
career_matches=career_matches,
)
return Response(content=pdf_bytes, media_type="application/pdf", ...)

The same file’s GET .../report/download?format=txt|md only writes plain text/Markdown temporary files, does not support pdf parameter; PDF must go through /report/pdf.


PDF Generation Pipeline


3. Entry Function: generate_pdf

1
2
3
4
5
6
# tools/pdf_exporter.py
async def generate_pdf(report_md: str, title: str = "iCan Career Planning Report",
profile_data: dict = None, career_matches: list = None) -> bytes:
show_charts = bool(report_md and re.search(r"能力|雷达|技能|评估", report_md))
result = _build_pdf(report_md, title, show_charts)
return result

Key points:

  • profile_data / career_matches are passed but not used for chart data (see pitfall ①);
  • show_charts is determined by whether the report body contains keywords like “能力”, “雷达”, “技能”, “评估”, not by the existence of profile fields.

4. Chinese Fonts: Multi‑path Detection + Silent Fallback

_build_pdf defaults cn_font = "Helvetica" and attempts to register TTF/TTC in order:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# tools/pdf_exporter.py — font path (excerpt)
for fp in [
"/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc",
"/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc",
"/usr/share/fonts/opentype/noto/NotoSansCJKsc-Regular.otf",
"/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc",
"/usr/share/fonts/truetype/wqy/wqy-microhei.ttc",
"/usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf",
"/System/Library/Fonts/PingFang.ttc",
"/System/Library/Fonts/STHeiti Light.ttc",
"/System/Library/Fonts/Hiragino Sans GB.ttc",
"/Library/Fonts/Arial Unicode.ttf",
]:
if os.path.exists(fp):
try:
pdfmetrics.registerFont(TTFont("CNFont", fp))
cn_font = "CNFont"
break
except Exception:
pass

All ParagraphStyle (st_title, st_body, st_cell, etc.) uniformly use fontName=cn_font. Docker deployments need to install fonts-noto-cjk or WenQuanYi (Article 15), otherwise Chinese will display as Helvetica tofu blocks.


5. Markdown Parsing: Direct to ReportLab, No HTML

_parse_markdown splits Markdown into (type, data) blocks:

  • Headings h1h3 (#{1,4})
  • Paragraphs p, lists li, blockquotes quote, horizontal rules hr
  • Table rows tr (| col | col |, skip separator rows |---|)

Inline formatting is converted by _parse_inline to ReportLab XML: **bold** → green bold, `code` → gray background Courier.

Table cells are wrapped in Paragraph(..., st_cell) to support line breaks; column widths are allocated by _calc_col_widths proportionally to text length, with a minimum of 1.5 cm.

1
2
3
4
5
6
7
8
# Table row flush (excerpt)
t = Table(table_data, colWidths=col_ws, repeatRows=1)
t.setStyle(TableStyle([
("GRID", (0, 0), (-1, -1), 0.5, border_c),
("BACKGROUND", (0, 0), (-1, 0), header_bg),
("ROWBACKGROUNDS", (0, 1), (-1, -1), [colors.white, colors.HexColor("#f9fafb")]),
]))
elements.append(KeepTogether(t))

This structure was adopted after changing from “Markdown → HTML → parse” to direct parsing, avoiding double‑escaping of HTML entities.


6. Embedding matplotlib Charts

When show_charts=True, two charts are inserted before the body:

Function Chart Data source
_generate_radar_chart Ability radar chart Parameter ability, default DEFAULT_RADAR_DATA
_generate_bar_chart Holland bar chart Parameter holland_data, default DEFAULT_HOLLAND_DATA

matplotlib uses the Agg backend, PNG 160 dpi written to BytesIO, then base64; during embedding, it decodes to a temporary file for ReportLab Image:

1
2
3
4
5
6
7
8
9
# tools/pdf_exporter.py — radar chart (excerpt)
plt.rcParams["font.sans-serif"] = [
"Noto Sans CJK SC", "WenQuanYi Zen Hei", "PingFang SC", "SimHei", ...
]
fig, ax = plt.subplots(figsize=(5.5, 4.8), subplot_kw=dict(polar=True))
ax.plot(angles, values_plot, "o-", color="#0d9488")
buf = io.BytesIO()
plt.savefig(buf, format="png", dpi=160, bbox_inches="tight")
return base64.b64encode(buf.read()).decode()

Chart titles and brand color #0d9488 match the PDF body accent.


7. Relationship with Reporter

The inner sub‑graph in agents/reporter.py: load_all_results → each generate_*_sectioncompile_final_report, eventually writing to ReporterState.final_report (Markdown).

PDF export does not participate in the Reporter LangGraph; it only runs when the user clicks download:

1
2
3
4
workflow_data.final_report (Markdown)
→ generate_pdf()
→ _build_pdf() + optional charts
→ bytes → HTTP Response

If run_analysis_pipeline uses the Ollama‑unavailable fallback (rule engine _generate_fallback_report), PDF can still render, but the body will be rule‑template Markdown.


8. Pitfalls

profile_data / career_matches do not drive charts
download_report_pdf passes RIASEC scores and personal profile to generate_pdf, but _generate_radar_chart() / _generate_bar_chart() when called without parameters use hardcoded DEFAULT dictionaries at the module top, not reading from profile_data["riasec_scores"] or ability_model. The interface signature and implementation are inconsistent; changing PDF to use real data requires modifying the _build_pdf argument logic.

② Font registration failure is silent
The TTFont("CNFont", fp) call is wrapped in try/except. Some ReportLab versions fail to register .ttc multi‑font collections but silently continue to the next path; if all paths fail, cn_font remains Helvetica, Chinese is unusable with no explicit error.

③ Chart toggle relies on body keywords
If the report does not contain “能力/雷达/技能/评估”, no charts are inserted even if the user profile is complete. Conversely, if the body contains these words, it inserts default radar/bar charts, which may conflict with the report narrative.

④ txt/md and pdf entry points are separate
download_report?format=pdf returns 400; must use GET .../report/pdf. The frontend routing needs to handle them separately.


9. Summary

  • Markdown report is generated by agents/reporter.py; PDF is converted on‑demand by tools/pdf_exporter.generate_pdf at GET /api/sessions/{session_id}/report/pdf.
  • Chinese text depends on multi‑path TTFont registration in _build_pdf; all styles bind to cn_font.
  • Markdown is directly converted to ReportLab Paragraph / Table via _parse_markdown, with inline formatting handled by _parse_inline.
  • matplotlib charts are base64 → temporary PNG → Image; current chart data is default values, not user profile.
  • In production, test PDF Chinese and table wrapping on the target OS/Docker image, and consider wiring riasec_scores into _generate_bar_chart.

Next article: Stable Prompt and JSON output (llm/parsers.py).


Appendix: Key Source Code (Line‑by‑line comments)

The following code is extracted from the iCan implementation, with Chinese comments above each line, so you can follow along even without the public repository.
Generation command: python3 bin/build-ican-annotated-snippets.py

generate_pdf entry

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# ========== generate_pdf 入口 ==========
# 源文件: tools/pdf_exporter.py 行 267-281

# L267: 异步函数 generate_pdf:可被 await,适合 IO 型 LLM/DB 调用
async def generate_pdf(report_md: str, title: str = "iCan 职业规划报告",
# L268: 赋值:更新局部变量或 state 字段
profile_data: dict = None, career_matches: list = None) -> bytes:
# L269: 赋值:更新局部变量或 state 字段
profile_data = profile_data or {}
# L270: 赋值:更新局部变量或 state 字段
career_matches = career_matches or []
# L271: 赋值:更新局部变量或 state 字段
show_charts = bool(report_md and re.search(r"能力|雷达|技能|评估", report_md))

# L273: 赋值:更新局部变量或 state 字段
result = _build_pdf(report_md, title, show_charts)

# L275: 记录日志,便于线上排查节点入参/出参
logger.info(
# L276: 赋值:更新局部变量或 state 字段
"[generate_pdf] 完成 | 大小=%.1fKB | 图表=%s",
# L277: 执行该语句(细节见上文业务描述)
len(result) / 1024,
# L278: 执行该语句(细节见上文业务描述)
show_charts,
# L279: 执行该语句(细节见上文业务描述)
)
# L280: 返回本节点要合并进 state 的字段(LangGraph 会 merge)
return result

_build_pdf start

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# ========== _build_pdf 开头 ==========
# 源文件: tools/pdf_exporter.py 行 283-340

# L283: 同步函数 _build_pdf:路由决策或工厂方法
def _build_pdf(report_md: str, title: str, show_charts: bool) -> bytes:
# L284: 导入依赖模块
from reportlab.lib.pagesizes import A4
# L285: 导入依赖模块
from reportlab.platypus import (
# L286: 执行该语句(细节见上文业务描述)
SimpleDocTemplate, Paragraph, Spacer, Image, Table, TableStyle,
# L287: 执行该语句(细节见上文业务描述)
KeepTogether,
# L288: 执行该语句(细节见上文业务描述)
)
# L289: 导入依赖模块
from reportlab.lib.styles import ParagraphStyle
# L290: 导入依赖模块
from reportlab.lib.units import cm
# L291: 导入依赖模块
from reportlab.lib import colors
# L292: 导入依赖模块
from reportlab.pdfbase import pdfmetrics
# L293: 导入依赖模块
from reportlab.pdfbase.ttfonts import TTFont
# L294: 导入依赖模块
from reportlab.lib.enums import TA_CENTER, TA_JUSTIFY, TA_LEFT

# L296: 赋值:更新局部变量或 state 字段
buf = io.BytesIO()
# L297: 赋值:更新局部变量或 state 字段
doc = SimpleDocTemplate(
# L298: 执行该语句(细节见上文业务描述)
buf,
# L299: 赋值:更新局部变量或 state 字段
pagesize=A4,
# L300: 赋值:更新局部变量或 state 字段
leftMargin=2 * cm,
# L301: 赋值:更新局部变量或 state 字段
rightMargin=2 * cm,
# L302: 赋值:更新局部变量或 state 字段
topMargin=2 * cm,
# L303: 赋值:更新局部变量或 state 字段
bottomMargin=2 * cm,
# L304: 执行该语句(细节见上文业务描述)
)

# L306: 赋值:更新局部变量或 state 字段
cn_font = "Helvetica"
# L307: 循环
for fp in [
# L308: 执行该语句(细节见上文业务描述)
"/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc",
# L309: 执行该语句(细节见上文业务描述)
"/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc",
# L310: 执行该语句(细节见上文业务描述)
"/usr/share/fonts/opentype/noto/NotoSansCJKsc-Regular.otf",
# L311: 执行该语句(细节见上文业务描述)
"/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc",
# L312: 执行该语句(细节见上文业务描述)
"/usr/share/fonts/truetype/wqy/wqy-microhei.ttc",
# L313: 执行该语句(细节见上文业务描述)
"/usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf",
# L314: 执行该语句(细节见上文业务描述)
"/System/Library/Fonts/PingFang.ttc",
# L315: 执行该语句(细节见上文业务描述)
"/System/Library/Fonts/STHeiti Light.ttc",
# L316: 执行该语句(细节见上文业务描述)
"/System/Library/Fonts/Hiragino Sans GB.ttc",
# L317: 执行该语句(细节见上文业务描述)
"/Library/Fonts/Arial Unicode.ttf",
# L318: 执行该语句(细节见上文业务描述)
]:
# L319: 条件分支
if os.path.exists(fp):
# L320: 开始 try 块,后续 except 负责兜底
try:
# L321: 执行该语句(细节见上文业务描述)
pdfmetrics.registerFont(TTFont("CNFont", fp))
# L322: 赋值:更新局部变量或 state 字段
cn_font = "CNFont"
# L323: 执行该语句(细节见上文业务描述)
break
# L324: 捕获异常,避免整图/整请求崩溃
except Exception:
# L325: 执行该语句(细节见上文业务描述)
pass

# L327: 赋值:更新局部变量或 state 字段
accent_c = colors.HexColor("#0d9488")
# L328: 赋值:更新局部变量或 state 字段
text_c = colors.HexColor("#1f2937")
# L329: 赋值:更新局部变量或 state 字段
gray_c = colors.HexColor("#6b7280")
# L330: 赋值:更新局部变量或 state 字段
border_c = colors.HexColor("#e5e7eb")
# L331: 赋值:更新局部变量或 state 字段
header_bg = colors.HexColor("#f1f5f9")

# L333: 赋值:更新局部变量或 state 字段
st_title = ParagraphStyle("Title", fontName=cn_font, fontSize=22,
# L334: 赋值:更新局部变量或 state 字段
textColor=accent_c, alignment=TA_CENTER,
# L335: 赋值:更新局部变量或 state 字段
spaceAfter=6, leading=28)
# L336: 赋值:更新局部变量或 state 字段
st_date = ParagraphStyle("Date", fontName=cn_font, fontSize=10,
# L337: 赋值:更新局部变量或 state 字段
textColor=gray_c, alignment=TA_CENTER, spaceAfter=4)
# L338: 赋值:更新局部变量或 state 字段
st_sub = ParagraphStyle("Sub", fontName=cn_font, fontSize=10,
# L339: 赋值:更新局部变量或 state 字段
textColor=gray_c, alignment=TA_CENTER, spaceAfter=20)
# L340: 赋值:更新局部变量或 state 字段
st_h1 = ParagraphStyle("H1", fontName=cn_font, fontSize=17,

GET .../report/pdf download

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# ========== GET .../report/pdf 下载 ==========
# 源文件: api/routes/report.py 行 195-220

# L195: 装饰器
@router.get("/{session_id}/report/pdf")
# L196: 异步函数 download_report_pdf:可被 await,适合 IO 型 LLM/DB 调用
async def download_report_pdf(session_id: str):
# L197: 导入依赖模块
from fastapi.responses import Response
# L198: 导入依赖模块
from ican.tools.pdf_exporter import generate_pdf

# L200: 赋值:更新局部变量或 state 字段
session_data = repository.get_session(session_id)
# L201: 条件分支
if not session_data:
# L202: 向上抛出异常,由调用方或 LangGraph 处理
raise HTTPException(status_code=404, detail="会话不存在")

# L204: JSON 字段:存对话历史、中间结果、final_report 等
workflow_data = session_data.get("workflow_data") or {}
# L205: JSON 字段:存对话历史、中间结果、final_report 等
report_md = workflow_data.get("final_report", "")
# L206: 条件分支
if not report_md:
# L207: 向上抛出异常,由调用方或 LangGraph 处理
raise HTTPException(status_code=404, detail="报告尚未生成")

# L209: JSON 字段:存对话历史、中间结果、final_report 等
profile_data = workflow_data.get("personal_profile") or workflow_data.get("structured_profile") or {}
# L210: JSON 字段:存对话历史、中间结果、final_report 等
career_matches = workflow_data.get("career_matches") or []

# L212: 赋值:更新局部变量或 state 字段
pdf_bytes = await generate_pdf(
# L213: 执行该语句(细节见上文业务描述)
report_md,
# L214: 赋值:更新局部变量或 state 字段
title="iCan 职业规划报告",
# L215: 赋值:更新局部变量或 state 字段
profile_data=profile_data,
# L216: 赋值:更新局部变量或 state 字段
career_matches=career_matches,
# L217: 执行该语句(细节见上文业务描述)
)

# L219: 返回本节点要合并进 state 的字段(LangGraph 会 merge)
return Response(
# L220: 赋值:更新局部变量或 state 字段
content=pdf_bytes,

Series Navigation

Article Topic
1 System Overview
2 Five‑Agent Collaboration
3 Holland RIASEC
4–7 State · Routing · Nesting · Fault Tolerance
8–11 LLM Layer · SSE/WS · DB Migration · PDF
12–14 JSON Prompt · RIASEC Prompt · Guide Prompt
15–17 Docker · Middleware · Configuration

← Back to iCan Special Topic