0. Series Loop (Read Along Without Public Source Code)
End-to-End Pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pipeline (parsing → analysis → matching → report) → tools/pdf_exporter PDF.
This Article: 10/17 · Persistence Ring · DB & Migration
| Stage | User Visible | Code Entry | Article |
|---|---|---|---|
| Create Session | Welcome Message | POST /api/sessions | 09 |
| Multi-turn Dialogue | SSE Streaming | chat/stream → run_guide_single_turn | 06, 14 |
| Info Sufficient | Start Analysis | _run_analysis_background | 05, 07 |
| Resume Parsing | Progress 30% | run_resume_parser | 12 |
| Profile / RIASEC | Progress 50% | run_profile_analyzer | 03, 13 |
| Career Matching | Progress 70% | run_career_matcher | 02 |
| Report | Progress 90% | run_reporter | 11 |
| Download PDF | File | GET …/report/pdf | 11, 15 |
| Description | |
|---|---|
| Before reading this | Session saving from Article 09 |
| After reading this | Understand init_db + _auto_migrate startup sequence |
| Next ring | Article 11: From final_report to PDF (Article 11) |
Full series loop index: SERIES-LOOP.md
1. What Problem Are We Solving?
iCan uses SQLite (default sqlite:///./ican.db) to store sessions and reports. During iteration, workflow_data (JSON, storing conversation history and intermediate analysis results) was added to the Session table.
If we only use create_all(): new tables can be created, but missing columns in existing tables won’t be automatically added. After local upgrades, it’s common to get a “column does not exist” error.
In the MVP stage, we don’t want the full Alembic setup, so we added _auto_migrate in db/session.py: on startup, compare ORM with actual table structure, if columns are missing, ALTER TABLE ADD COLUMN.
2. Call Chain: lifespan → init_db → _auto_migrate
The entry point is in main.py‘s FastAPI lifespan:
1 | |
init_db() (db/session.py) has a fixed order:
Base.metadata.create_all(bind=engine, checkfirst=True)— creates non-existent tables;_auto_migrate(engine)— adds columns to existing tables.
This runs every time the container/process starts, no manual migration commands needed.
3. _auto_migrate Implementation (Consistent with Source Code)
1 | |
Logic: only adds columns, does not delete, change types, or rename.
4. Typical Scenario: sessions.workflow_data
db/models.py Session definition (excerpt):
1 | |
If the old database only has the first three columns, a log similar to this appears on startup:
1 | |
workflow.run_analysis_pipeline writes quick reports into SessionRepository.save_session(..., workflow_data={...}) when Ollama is unavailable — it depends on this column existing.
The same file also has tables like messages, user_profiles, career_reports, etc. New fields also go through this logic.
5. Engine and Session: Relationship with Migration
_get_engine() is a lazy singleton that reads settings.DB_URL:
- SQLite:
check_same_thread=False; - PostgreSQL etc.:
pool_size=5,pool_recycle=3600.
Migration runs after the engine is created and before business requests; it’s independent of the get_db_session() context manager (commit/rollback).
6. Capability Boundaries (Must Be Clear)
| Operation | _auto_migrate | Alembic |
|---|---|---|
| New Table | ✅ create_all | ✅ |
| Add Column | ✅ | ✅ |
| Delete Column | ❌ | ✅ |
| Change Column Type | ❌ | ✅ |
| Rollback | ❌ | ✅ |
| Multi-environment Version Chain | ❌ | ✅ |
Adding a NOT NULL column without a DEFAULT: On tables with existing data, ALTER ADD may fail or leave empty rows violating constraints — iCan’s current new columns are mostly nullable=True or have defaults, intentionally avoiding this.
SQLite and JSON: SQLAlchemy JSON stores as text on SQLite; switching to PostgreSQL cross-dialect requires re-validating the type compilation result.
7. Pitfalls
① create_all cannot replace migration
Running only create_all without _auto_migrate means old user databases will always miss columns. Ensure lifespan calls init_db(), not just create_all alone.
② Production multi-instance simultaneous startup
Multiple workers running ALTER TABLE on the same column concurrently may result in a “duplicate column” error for the second one — the current code does not catch this; in production use Alembic or run migration on a single instance before startup.
③ Type changes require manual work
For example, changing String(20) to String(50) — _auto_migrate won’t handle it; you need to write SQL manually or use an Alembic revision.
④ workflow_data size
Packing the entire conversation and analysis results into JSON can cause a single row to bloat over time; this is a product-level issue, not a migration one, but DBA perspective should be aware.
8. Summary
- Migration entry:
main.lifespan→init_db()→create_all+_auto_migrate. - Implementation file:
db/session.py; model definition:db/models.py. - Suitable for MVP quick column addition; for delete/change columns, rollback, team collaboration, switch to Alembic.
- Business-critical column:
sessions.workflow_datasupports persistence for the analysis pipeline and Ollama fallback reports.
Next article: How
tools/pdf_exporter.pyturns Reporter’s Markdown into a Chinese PDF.
Appendix: Key Source Code (Line-by-Line Annotations)
The following code is from the iCan implementation, with Chinese comments above each line, allowing you to follow along even without the public repo.
Generation command: python3 bin/build-ican-annotated-snippets.py
lifespan calls init_db
1 | |
init_db
1 | |
_auto_migrate
1 | |
Session.workflow_data
1 | |
Series Navigation
| Article | Topic |
|---|---|
| 1 | System Overview |
| 2 | Five-Agent Collaboration |
| 3 | Holland RIASEC |
| 4-7 | State · Routing · Nesting · Fault Tolerance |
| 8-11 | LLM Layer · SSE/WS · DB Migration · PDF |
| 12-14 | JSON Prompt · RIASEC Prompt · Guide Prompt |
| 15-17 | Docker · Middleware · Configuration |