0. Series Loop (Follow along without open-source code)
End-to-end pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pipeline (parse→analyze→match→report) → tools/pdf_exporter PDF.
This article: 11/17 · Delivery Loop · PDF
| Phase | User visible | Code entry | Corresponding article |
|---|---|---|---|
| Create session | Welcome message | POST /api/sessions | 09 |
| Multi-turn dialogue | SSE streaming | chat/stream → run_guide_single_turn | 06, 14 |
| Sufficient info | Start analysis | _run_analysis_background | 05, 07 |
| Resume parsing | Progress 30% | run_resume_parser | 12 |
| Profile/RIASEC | Progress 50% | run_profile_analyzer | 03, 13 |
| Career matching | Progress 70% | run_career_matcher | 02 |
| Report | Progress 90% | run_reporter | 11 |
| Download PDF | File | GET …/report/pdf | 11, 15 |
| Description | |
|---|---|
| Before reading | Article 10: Report stored in DB |
| After reading | Follow along generate_pdf for Chinese fonts and Markdown parsing |
| Next loop | Article 15: Docker font dependencies (Article 12) |
Full series loop index: SERIES-LOOP.md
1. What Problem to Solve
The agents/reporter.py produces a Markdown string (final_report). The frontend can render it directly, but users need a PDF for download. Requirements:
- Chinese text, tables, heading levels must not be garbled;
- Ideally include charts for RIASEC / ability dimensions;
- Avoid an HTML→PDF intermediate layer (WeasyPrint/wkhtmltopdf have heavy dependencies and previously had escaping bugs).
Implementation is concentrated in tools/pdf_exporter.py; the HTTP entry point is GET /api/sessions/{session_id}/report/pdf in api/routes/report.py.
2. Implementation Location and Call Chain
| Module | Responsibility |
|---|---|
agents/reporter.py |
LangGraph sub‑graph generates Markdown final_report |
workflow.py / api/routes/chat.py |
After analysis completes, write final_report into workflow_data |
tools/pdf_exporter.py |
generate_pdf → _build_pdf: fonts, charts, Markdown parsing, ReportLab layout |
api/routes/report.py |
download_report_pdf reads from DB and returns application/pdf |
The Reporter chapter nodes use get_chat_model() + invoke_llm to write Markdown (same as article 8, currently not using get_light_model()). The PDF layer no longer calls LLM, it only consumes existing Markdown.
Download route (excerpt from api/routes/report.py):
1 | |
The same file’s GET .../report/download?format=txt|md only writes plain text/Markdown temporary files, does not support pdf parameter; PDF must go through /report/pdf.
3. Entry Function: generate_pdf
1 | |
Key points:
profile_data/career_matchesare passed but not used for chart data (see pitfall ①);show_chartsis determined by whether the report body contains keywords like “能力”, “雷达”, “技能”, “评估”, not by the existence of profile fields.
4. Chinese Fonts: Multi‑path Detection + Silent Fallback
_build_pdf defaults cn_font = "Helvetica" and attempts to register TTF/TTC in order:
1 | |
All ParagraphStyle (st_title, st_body, st_cell, etc.) uniformly use fontName=cn_font. Docker deployments need to install fonts-noto-cjk or WenQuanYi (Article 15), otherwise Chinese will display as Helvetica tofu blocks.
5. Markdown Parsing: Direct to ReportLab, No HTML
_parse_markdown splits Markdown into (type, data) blocks:
- Headings
h1–h3(#{1,4}) - Paragraphs
p, listsli, blockquotesquote, horizontal ruleshr - Table rows
tr(| col | col |, skip separator rows|---|)
Inline formatting is converted by _parse_inline to ReportLab XML: **bold** → green bold, `code` → gray background Courier.
Table cells are wrapped in Paragraph(..., st_cell) to support line breaks; column widths are allocated by _calc_col_widths proportionally to text length, with a minimum of 1.5 cm.
1 | |
This structure was adopted after changing from “Markdown → HTML → parse” to direct parsing, avoiding double‑escaping of HTML entities.
6. Embedding matplotlib Charts
When show_charts=True, two charts are inserted before the body:
| Function | Chart | Data source |
|---|---|---|
_generate_radar_chart |
Ability radar chart | Parameter ability, default DEFAULT_RADAR_DATA |
_generate_bar_chart |
Holland bar chart | Parameter holland_data, default DEFAULT_HOLLAND_DATA |
matplotlib uses the Agg backend, PNG 160 dpi written to BytesIO, then base64; during embedding, it decodes to a temporary file for ReportLab Image:
1 | |
Chart titles and brand color #0d9488 match the PDF body accent.
7. Relationship with Reporter
The inner sub‑graph in agents/reporter.py: load_all_results → each generate_*_section → compile_final_report, eventually writing to ReporterState.final_report (Markdown).
PDF export does not participate in the Reporter LangGraph; it only runs when the user clicks download:
1 | |
If run_analysis_pipeline uses the Ollama‑unavailable fallback (rule engine _generate_fallback_report), PDF can still render, but the body will be rule‑template Markdown.
8. Pitfalls
① profile_data / career_matches do not drive chartsdownload_report_pdf passes RIASEC scores and personal profile to generate_pdf, but _generate_radar_chart() / _generate_bar_chart() when called without parameters use hardcoded DEFAULT dictionaries at the module top, not reading from profile_data["riasec_scores"] or ability_model. The interface signature and implementation are inconsistent; changing PDF to use real data requires modifying the _build_pdf argument logic.
② Font registration failure is silent
The TTFont("CNFont", fp) call is wrapped in try/except. Some ReportLab versions fail to register .ttc multi‑font collections but silently continue to the next path; if all paths fail, cn_font remains Helvetica, Chinese is unusable with no explicit error.
③ Chart toggle relies on body keywords
If the report does not contain “能力/雷达/技能/评估”, no charts are inserted even if the user profile is complete. Conversely, if the body contains these words, it inserts default radar/bar charts, which may conflict with the report narrative.
④ txt/md and pdf entry points are separatedownload_report?format=pdf returns 400; must use GET .../report/pdf. The frontend routing needs to handle them separately.
9. Summary
- Markdown report is generated by
agents/reporter.py; PDF is converted on‑demand bytools/pdf_exporter.generate_pdfatGET /api/sessions/{session_id}/report/pdf. - Chinese text depends on multi‑path
TTFontregistration in_build_pdf; all styles bind tocn_font. - Markdown is directly converted to ReportLab
Paragraph/Tablevia_parse_markdown, with inline formatting handled by_parse_inline. - matplotlib charts are base64 → temporary PNG →
Image; current chart data is default values, not user profile. - In production, test PDF Chinese and table wrapping on the target OS/Docker image, and consider wiring
riasec_scoresinto_generate_bar_chart.
Next article: Stable Prompt and JSON output (
llm/parsers.py).
Appendix: Key Source Code (Line‑by‑line comments)
The following code is extracted from the iCan implementation, with Chinese comments above each line, so you can follow along even without the public repository.
Generation command: python3 bin/build-ican-annotated-snippets.py
generate_pdf entry
1 | |
_build_pdf start
1 | |
GET .../report/pdf download
1 | |
Series Navigation
| Article | Topic |
|---|---|
| 1 | System Overview |
| 2 | Five‑Agent Collaboration |
| 3 | Holland RIASEC |
| 4–7 | State · Routing · Nesting · Fault Tolerance |
| 8–11 | LLM Layer · SSE/WS · DB Migration · PDF |
| 12–14 | JSON Prompt · RIASEC Prompt · Guide Prompt |
| 15–17 | Docker · Middleware · Configuration |