Python PDF Generation in Practice: ReportLab + matplotlib for Professional Chinese Reports

0. Series Loop (Follow along without open-source code)

End-to-end pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pipeline (parse→analyze→match→report) → tools/pdf_exporter PDF.
This article: 11/17 · Delivery Loop · PDF

Phase	User visible	Code entry	Corresponding article
Create session	Welcome message	POST /api/sessions	09
Multi-turn dialogue	SSE streaming	chat/stream → run_guide_single_turn	06, 14
Sufficient info	Start analysis	_run_analysis_background	05, 07
Resume parsing	Progress 30%	run_resume_parser	12
Profile/RIASEC	Progress 50%	run_profile_analyzer	03, 13
Career matching	Progress 70%	run_career_matcher	02
Report	Progress 90%	run_reporter	11
Download PDF	File	GET …/report/pdf	11, 15

	Description
Before reading	Article 10: Report stored in DB
After reading	Follow along `generate_pdf` for Chinese fonts and Markdown parsing
Next loop	Article 15: Docker font dependencies (Article 12)

Full series loop index: SERIES-LOOP.md

1. What Problem to Solve

The agents/reporter.py produces a Markdown string (final_report). The frontend can render it directly, but users need a PDF for download. Requirements:

Chinese text, tables, heading levels must not be garbled;
Ideally include charts for RIASEC / ability dimensions;
Avoid an HTML→PDF intermediate layer (WeasyPrint/wkhtmltopdf have heavy dependencies and previously had escaping bugs).

Implementation is concentrated in tools/pdf_exporter.py; the HTTP entry point is GET /api/sessions/{session_id}/report/pdf in api/routes/report.py.

2. Implementation Location and Call Chain

Module	Responsibility
`agents/reporter.py`	LangGraph sub‑graph generates Markdown `final_report`
`workflow.py` / `api/routes/chat.py`	After analysis completes, write `final_report` into `workflow_data`
`tools/pdf_exporter.py`	`generate_pdf` → `_build_pdf`: fonts, charts, Markdown parsing, ReportLab layout
`api/routes/report.py`	`download_report_pdf` reads from DB and returns `application/pdf`

The Reporter chapter nodes use get_chat_model() + invoke_llm to write Markdown (same as article 8, currently not using get_light_model()). The PDF layer no longer calls LLM, it only consumes existing Markdown.

Download route (excerpt from api/routes/report.py):

@router.get("/{session_id}/report/pdf")
async def download_report_pdf(session_id: str):
    from ican.tools.pdf_exporter import generate_pdf

    workflow_data = session_data.get("workflow_data") or {}
    report_md = workflow_data.get("final_report", "")
    if not report_md:
        raise HTTPException(status_code=404, detail="Report not yet generated")

    profile_data = workflow_data.get("personal_profile") or workflow_data.get("structured_profile") or {}
    career_matches = workflow_data.get("career_matches") or []

    pdf_bytes = await generate_pdf(
        report_md,
        title="iCan Career Planning Report",
        profile_data=profile_data,
        career_matches=career_matches,
    )
    return Response(content=pdf_bytes, media_type="application/pdf", ...)

The same file’s GET .../report/download?format=txt|md only writes plain text/Markdown temporary files, does not support pdf parameter; PDF must go through /report/pdf.

PDF Generation Pipeline

3. Entry Function: `generate_pdf`

# tools/pdf_exporter.py
async def generate_pdf(report_md: str, title: str = "iCan Career Planning Report",
                       profile_data: dict = None, career_matches: list = None) -> bytes:
    show_charts = bool(report_md and re.search(r"能力|雷达|技能|评估", report_md))
    result = _build_pdf(report_md, title, show_charts)
    return result

Key points:

profile_data / career_matches are passed but not used for chart data (see pitfall ①);
show_charts is determined by whether the report body contains keywords like “能力”, “雷达”, “技能”, “评估”, not by the existence of profile fields.

4. Chinese Fonts: Multi‑path Detection + Silent Fallback

_build_pdf defaults cn_font = "Helvetica" and attempts to register TTF/TTC in order:

# tools/pdf_exporter.py — font path (excerpt)
for fp in [
    "/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc",
    "/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc",
    "/usr/share/fonts/opentype/noto/NotoSansCJKsc-Regular.otf",
    "/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc",
    "/usr/share/fonts/truetype/wqy/wqy-microhei.ttc",
    "/usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf",
    "/System/Library/Fonts/PingFang.ttc",
    "/System/Library/Fonts/STHeiti Light.ttc",
    "/System/Library/Fonts/Hiragino Sans GB.ttc",
    "/Library/Fonts/Arial Unicode.ttf",
]:
    if os.path.exists(fp):
        try:
            pdfmetrics.registerFont(TTFont("CNFont", fp))
            cn_font = "CNFont"
            break
        except Exception:
            pass

All ParagraphStyle (st_title, st_body, st_cell, etc.) uniformly use fontName=cn_font. Docker deployments need to install fonts-noto-cjk or WenQuanYi (Article 15), otherwise Chinese will display as Helvetica tofu blocks.

5. Markdown Parsing: Direct to ReportLab, No HTML

_parse_markdown splits Markdown into (type, data) blocks:

Headings h1–h3 (#{1,4})
Paragraphs p, lists li, blockquotes quote, horizontal rules hr
Table rows tr (| col | col |, skip separator rows |---|)

Inline formatting is converted by _parse_inline to ReportLab XML: **bold** → green bold, `code` → gray background Courier.

Table cells are wrapped in Paragraph(..., st_cell) to support line breaks; column widths are allocated by _calc_col_widths proportionally to text length, with a minimum of 1.5 cm.

# Table row flush (excerpt)
t = Table(table_data, colWidths=col_ws, repeatRows=1)
t.setStyle(TableStyle([
    ("GRID", (0, 0), (-1, -1), 0.5, border_c),
    ("BACKGROUND", (0, 0), (-1, 0), header_bg),
    ("ROWBACKGROUNDS", (0, 1), (-1, -1), [colors.white, colors.HexColor("#f9fafb")]),
]))
elements.append(KeepTogether(t))

This structure was adopted after changing from “Markdown → HTML → parse” to direct parsing, avoiding double‑escaping of HTML entities.

6. Embedding matplotlib Charts

When show_charts=True, two charts are inserted before the body:

Function	Chart	Data source
`_generate_radar_chart`	Ability radar chart	Parameter `ability`, default `DEFAULT_RADAR_DATA`
`_generate_bar_chart`	Holland bar chart	Parameter `holland_data`, default `DEFAULT_HOLLAND_DATA`

matplotlib uses the Agg backend, PNG 160 dpi written to BytesIO, then base64; during embedding, it decodes to a temporary file for ReportLab Image:

# tools/pdf_exporter.py — radar chart (excerpt)
plt.rcParams["font.sans-serif"] = [
    "Noto Sans CJK SC", "WenQuanYi Zen Hei", "PingFang SC", "SimHei", ...
]
fig, ax = plt.subplots(figsize=(5.5, 4.8), subplot_kw=dict(polar=True))
ax.plot(angles, values_plot, "o-", color="#0d9488")
buf = io.BytesIO()
plt.savefig(buf, format="png", dpi=160, bbox_inches="tight")
return base64.b64encode(buf.read()).decode()

Chart titles and brand color #0d9488 match the PDF body accent.

7. Relationship with Reporter

The inner sub‑graph in agents/reporter.py: load_all_results → each generate_*_section → compile_final_report, eventually writing to ReporterState.final_report (Markdown).

PDF export does not participate in the Reporter LangGraph; it only runs when the user clicks download:

workflow_data.final_report (Markdown)
    → generate_pdf()
    → _build_pdf() + optional charts
    → bytes → HTTP Response

If run_analysis_pipeline uses the Ollama‑unavailable fallback (rule engine _generate_fallback_report), PDF can still render, but the body will be rule‑template Markdown.

8. Pitfalls

① profile_data / career_matches do not drive charts
download_report_pdf passes RIASEC scores and personal profile to generate_pdf, but _generate_radar_chart() / _generate_bar_chart() when called without parameters use hardcoded DEFAULT dictionaries at the module top, not reading from profile_data["riasec_scores"] or ability_model. The interface signature and implementation are inconsistent; changing PDF to use real data requires modifying the _build_pdf argument logic.

② Font registration failure is silent
The TTFont("CNFont", fp) call is wrapped in try/except. Some ReportLab versions fail to register .ttc multi‑font collections but silently continue to the next path; if all paths fail, cn_font remains Helvetica, Chinese is unusable with no explicit error.

③ Chart toggle relies on body keywords
If the report does not contain “能力/雷达/技能/评估”, no charts are inserted even if the user profile is complete. Conversely, if the body contains these words, it inserts default radar/bar charts, which may conflict with the report narrative.

④ txt/md and pdf entry points are separate
download_report?format=pdf returns 400; must use GET .../report/pdf. The frontend routing needs to handle them separately.

9. Summary

Markdown report is generated by agents/reporter.py; PDF is converted on‑demand by tools/pdf_exporter.generate_pdf at GET /api/sessions/{session_id}/report/pdf.
Chinese text depends on multi‑path TTFont registration in _build_pdf; all styles bind to cn_font.
Markdown is directly converted to ReportLab Paragraph / Table via _parse_markdown, with inline formatting handled by _parse_inline.
matplotlib charts are base64 → temporary PNG → Image; current chart data is default values, not user profile.
In production, test PDF Chinese and table wrapping on the target OS/Docker image, and consider wiring riasec_scores into _generate_bar_chart.

Next article: Stable Prompt and JSON output (llm/parsers.py).

Appendix: Key Source Code (Line‑by‑line comments)

The following code is extracted from the iCan implementation, with Chinese comments above each line, so you can follow along even without the public repository.
Generation command: python3 bin/build-ican-annotated-snippets.py

`generate_pdf` entry

# ========== generate_pdf 入口 ==========
# 源文件: tools/pdf_exporter.py  行 267-281

# L267: 异步函数 generate_pdf：可被 await，适合 IO 型 LLM/DB 调用
async def generate_pdf(report_md: str, title: str = "iCan 职业规划报告",
# L268: 赋值：更新局部变量或 state 字段
                       profile_data: dict = None, career_matches: list = None) -> bytes:
# L269: 赋值：更新局部变量或 state 字段
    profile_data = profile_data or {}
# L270: 赋值：更新局部变量或 state 字段
    career_matches = career_matches or []
# L271: 赋值：更新局部变量或 state 字段
    show_charts = bool(report_md and re.search(r"能力|雷达|技能|评估", report_md))

# L273: 赋值：更新局部变量或 state 字段
    result = _build_pdf(report_md, title, show_charts)

# L275: 记录日志，便于线上排查节点入参/出参
    logger.info(
# L276: 赋值：更新局部变量或 state 字段
        "[generate_pdf] 完成 | 大小=%.1fKB | 图表=%s",
# L277: 执行该语句（细节见上文业务描述）
        len(result) / 1024,
# L278: 执行该语句（细节见上文业务描述）
        show_charts,
# L279: 执行该语句（细节见上文业务描述）
    )
# L280: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
    return result

`_build_pdf` start

# ========== _build_pdf 开头 ==========
# 源文件: tools/pdf_exporter.py  行 283-340

# L283: 同步函数 _build_pdf：路由决策或工厂方法
def _build_pdf(report_md: str, title: str, show_charts: bool) -> bytes:
# L284: 导入依赖模块
    from reportlab.lib.pagesizes import A4
# L285: 导入依赖模块
    from reportlab.platypus import (
# L286: 执行该语句（细节见上文业务描述）
        SimpleDocTemplate, Paragraph, Spacer, Image, Table, TableStyle,
# L287: 执行该语句（细节见上文业务描述）
        KeepTogether,
# L288: 执行该语句（细节见上文业务描述）
    )
# L289: 导入依赖模块
    from reportlab.lib.styles import ParagraphStyle
# L290: 导入依赖模块
    from reportlab.lib.units import cm
# L291: 导入依赖模块
    from reportlab.lib import colors
# L292: 导入依赖模块
    from reportlab.pdfbase import pdfmetrics
# L293: 导入依赖模块
    from reportlab.pdfbase.ttfonts import TTFont
# L294: 导入依赖模块
    from reportlab.lib.enums import TA_CENTER, TA_JUSTIFY, TA_LEFT

# L296: 赋值：更新局部变量或 state 字段
    buf = io.BytesIO()
# L297: 赋值：更新局部变量或 state 字段
    doc = SimpleDocTemplate(
# L298: 执行该语句（细节见上文业务描述）
        buf,
# L299: 赋值：更新局部变量或 state 字段
        pagesize=A4,
# L300: 赋值：更新局部变量或 state 字段
        leftMargin=2 * cm,
# L301: 赋值：更新局部变量或 state 字段
        rightMargin=2 * cm,
# L302: 赋值：更新局部变量或 state 字段
        topMargin=2 * cm,
# L303: 赋值：更新局部变量或 state 字段
        bottomMargin=2 * cm,
# L304: 执行该语句（细节见上文业务描述）
    )

# L306: 赋值：更新局部变量或 state 字段
    cn_font = "Helvetica"
# L307: 循环
    for fp in [
# L308: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc",
# L309: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc",
# L310: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/opentype/noto/NotoSansCJKsc-Regular.otf",
# L311: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc",
# L312: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/truetype/wqy/wqy-microhei.ttc",
# L313: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf",
# L314: 执行该语句（细节见上文业务描述）
        "/System/Library/Fonts/PingFang.ttc",
# L315: 执行该语句（细节见上文业务描述）
        "/System/Library/Fonts/STHeiti Light.ttc",
# L316: 执行该语句（细节见上文业务描述）
        "/System/Library/Fonts/Hiragino Sans GB.ttc",
# L317: 执行该语句（细节见上文业务描述）
        "/Library/Fonts/Arial Unicode.ttf",
# L318: 执行该语句（细节见上文业务描述）
    ]:
# L319: 条件分支
        if os.path.exists(fp):
# L320: 开始 try 块，后续 except 负责兜底
            try:
# L321: 执行该语句（细节见上文业务描述）
                pdfmetrics.registerFont(TTFont("CNFont", fp))
# L322: 赋值：更新局部变量或 state 字段
                cn_font = "CNFont"
# L323: 执行该语句（细节见上文业务描述）
                break
# L324: 捕获异常，避免整图/整请求崩溃
            except Exception:
# L325: 执行该语句（细节见上文业务描述）
                pass

# L327: 赋值：更新局部变量或 state 字段
    accent_c = colors.HexColor("#0d9488")
# L328: 赋值：更新局部变量或 state 字段
    text_c = colors.HexColor("#1f2937")
# L329: 赋值：更新局部变量或 state 字段
    gray_c = colors.HexColor("#6b7280")
# L330: 赋值：更新局部变量或 state 字段
    border_c = colors.HexColor("#e5e7eb")
# L331: 赋值：更新局部变量或 state 字段
    header_bg = colors.HexColor("#f1f5f9")

# L333: 赋值：更新局部变量或 state 字段
    st_title = ParagraphStyle("Title", fontName=cn_font, fontSize=22,
# L334: 赋值：更新局部变量或 state 字段
                              textColor=accent_c, alignment=TA_CENTER,
# L335: 赋值：更新局部变量或 state 字段
                              spaceAfter=6, leading=28)
# L336: 赋值：更新局部变量或 state 字段
    st_date = ParagraphStyle("Date", fontName=cn_font, fontSize=10,
# L337: 赋值：更新局部变量或 state 字段
                             textColor=gray_c, alignment=TA_CENTER, spaceAfter=4)
# L338: 赋值：更新局部变量或 state 字段
    st_sub = ParagraphStyle("Sub", fontName=cn_font, fontSize=10,
# L339: 赋值：更新局部变量或 state 字段
                            textColor=gray_c, alignment=TA_CENTER, spaceAfter=20)
# L340: 赋值：更新局部变量或 state 字段
    st_h1 = ParagraphStyle("H1", fontName=cn_font, fontSize=17,

`GET .../report/pdf` download

# ========== GET .../report/pdf 下载 ==========
# 源文件: api/routes/report.py  行 195-220

# L195: 装饰器
@router.get("/{session_id}/report/pdf")
# L196: 异步函数 download_report_pdf：可被 await，适合 IO 型 LLM/DB 调用
async def download_report_pdf(session_id: str):
# L197: 导入依赖模块
    from fastapi.responses import Response
# L198: 导入依赖模块
    from ican.tools.pdf_exporter import generate_pdf

# L200: 赋值：更新局部变量或 state 字段
    session_data = repository.get_session(session_id)
# L201: 条件分支
    if not session_data:
# L202: 向上抛出异常，由调用方或 LangGraph 处理
        raise HTTPException(status_code=404, detail="会话不存在")

# L204: JSON 字段：存对话历史、中间结果、final_report 等
    workflow_data = session_data.get("workflow_data") or {}
# L205: JSON 字段：存对话历史、中间结果、final_report 等
    report_md = workflow_data.get("final_report", "")
# L206: 条件分支
    if not report_md:
# L207: 向上抛出异常，由调用方或 LangGraph 处理
        raise HTTPException(status_code=404, detail="报告尚未生成")

# L209: JSON 字段：存对话历史、中间结果、final_report 等
    profile_data = workflow_data.get("personal_profile") or workflow_data.get("structured_profile") or {}
# L210: JSON 字段：存对话历史、中间结果、final_report 等
    career_matches = workflow_data.get("career_matches") or []

# L212: 赋值：更新局部变量或 state 字段
    pdf_bytes = await generate_pdf(
# L213: 执行该语句（细节见上文业务描述）
        report_md,
# L214: 赋值：更新局部变量或 state 字段
        title="iCan 职业规划报告",
# L215: 赋值：更新局部变量或 state 字段
        profile_data=profile_data,
# L216: 赋值：更新局部变量或 state 字段
        career_matches=career_matches,
# L217: 执行该语句（细节见上文业务描述）
    )

# L219: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
    return Response(
# L220: 赋值：更新局部变量或 state 字段
        content=pdf_bytes,

Article	Topic
1	System Overview
2	Five‑Agent Collaboration
3	Holland RIASEC
4–7	State · Routing · Nesting · Fault Tolerance
8–11	LLM Layer · SSE/WS · DB Migration · PDF
12–14	JSON Prompt · RIASEC Prompt · Guide Prompt
15–17	Docker · Middleware · Configuration

← Back to iCan Special Topic