0. Series Loop (Follow Along Even Without Public Source Code)
End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (Parse→Analyze→Match→Report) → tools/pdf_exporter PDF.
This Article: 16/17 · Gateway Loop · Middleware
| Stage | User Visible | Code Entry | Article |
|---|---|---|---|
| Create Session | Welcome Message | POST /api/sessions | 09 |
| Multi-turn Dialogue | SSE Streaming | chat/stream → run_guide_single_turn | 06, 14 |
| Info Complete | Start Analysis | _run_analysis_background | 05, 07 |
| Resume Parsing | Progress 30% | run_resume_parser | 12 |
| Profile/RIASEC | Progress 50% | run_profile_analyzer | 03, 13 |
| Career Matching | Progress 70% | run_career_matcher | 02 |
| Report | Progress 90% | run_reporter | 11 |
| Download PDF | File | GET …/report/pdf | 11, 15 |
| Description | |
|---|---|
| Before Reading This | Article 01 main.py |
| After Reading This | Can state the registration order of CORS & RateLimit and /health whitelist |
| Next Loop | Article 17: Configuration Sources (Article 17) |
Full series loop index: SERIES-LOOP.md
I. What Problem to Solve
iCan’s /chat/stream, report generation, and file uploads all trigger LLM or long-running tasks. During public demos or internal testing, without IP rate limiting, a single client hammering the interface can lead to:
- LLM quota drained quickly (cost and rate double hit);
- Uvicorn workers saturated, causing SSE stream stuttering for legitimate users;
- Logs flooded by the same IP, making it hard to diagnose real faults.
The project implements a custom sliding window IP rate limiter in api/middleware.py (without introducing slowapi/Redis, suitable for MVP size), registered alongside CORSMiddleware in main.py‘s create_app().
II. Implementation Location
| Component | File | Description |
|---|---|---|
| Rate Limiting | api/middleware.py |
RateLimitMiddleware |
| CORS + Registration Order | main.py |
Two add_middleware calls inside create_app() |
| Affected Routes | api/routes/chat.py etc. |
All HTTP requests go through middleware chain |
III. Starlette Middleware Order (Onion Model)
In FastAPI/Starlette, middleware added later encounters the request first. The actual registration order in main.py:
1 | |
Thus the request path is:
1 | |
RateLimit is the outermost layer: when exceeded, it returns 429 directly in dispatch, without running CORS logic and routing – suitable for blocking abusive requests. If the order were reversed, CORS would be processed before rate limiting, wasting CPU.
IV. RateLimitMiddleware Implementation
Full logic in api/middleware.py:
1 | |
Design choices:
| Point | Description |
|---|---|
| Sliding Window | Filter expired records by timestamp, smoother than “reset every minute” |
In-memory defaultdict(list) |
Simple implementation; counters reset on process restart; no sharing across replicas |
/health exemption |
Docker/K8s healthcheck not blocked by 429 |
| 429 JSON body | Frontend can show unified toast, fields error + detail |
Default: 60 requests / 60 seconds / IP, hardcoded in main.py; not yet pulled from config.py settings – quota changes require code modification or later extraction to config (Article 17).
V. CORSMiddleware Configuration
MVP stage allows all origins for easy testing with local Vue dev server and production domain:
1 | |
Note the combination of allow_origins=["*"] and allow_credentials=True: Under browser spec, cross-origin requests with cookies cannot use Access-Control-Allow-Origin: *. In development, without credentials, this may be fine; in production with login cookies, allow_origins must be changed to an explicit domain list (e.g., https://app.example.com).
iCan’s current JWT/session uses headers rather than cookies, so impact is minimal, but before going live, the frontend’s actual credential method should be reviewed.
VI. Coordination with Business Routes
POST /chat/stream(api/routes/chat.py): One user message usually counts as 1 rate limit; if the frontend mistakenly reconnects SSE frequently, it will quickly hit 429.- WebSocket (
api/routes/ws.py): The handshake is an HTTP upgrade, also passes throughRateLimitMiddleware, counted in the same IP window. POSTresume upload: Large body still counts as one request; currently no separate lower quota for uploads or rate limiting byuser_id./health: Not rate limited, ensuring orchestrator health checks work reliably.
Extension idea (not implemented): Branch in dispatch by request.url.path, e.g., chat 30/min, upload 10/min, parameters from config.py.
VII. 429 Response and CORS Headers
When rate limited, a bare JSONResponse(429) is returned. Since it does not go through the CORSMiddleware path that adds headers to successful responses, in some cross-origin scenarios the browser may show only a network error, unable to read the JSON body.
Given the current order (RateLimit outermost), the 429 is returned directly by RateLimit, potentially missing CORS headers. For same-origin internal deployments, this is usually fine; if frontend and API are on different origins and need to parse the 429 body, you can manually add Access-Control-Allow-Origin in the 429 branch, or adjust middleware order and test browser behavior.
VIII. Production Evolution Path
| Stage | Solution |
|---|---|
| MVP (current) | Single-process in-memory sliding window, 60/min/IP |
| Multiple Uvicorn workers / multiple Pods | Each replica independent counter, effective limit ≈ N × 60; need Redis INCR+TTL or gateway rate limiting |
| Logged-in users | Rate limit by user_id, reduce false positives for multiple users behind NAT sharing IP |
| Edge | Static resources via CDN, API through WAF/cloud vendor rate limits |
IX. Pitfalls and Edge Cases
request.clientis None or proxy IP wrong
Code usesrequest.client.host; if Nginx doesn’t passX-Forwarded-For, all requests behind proxy may show127.0.0.1or"unknown", causing all users to share one bucket. In production, configure real IP in reverse proxy, and read the first segment ofX-Forwarded-Forin middleware (current code doesn’t do this – known gap)./healthpath hardcoded
If health check changes to/api/healthor with a prefix, the exemption condition must be updated accordingly; otherwise, probes get 429, causing containers to restart repeatedly.Memory leak boundary
_requestsonly cleans the current IP’s timestamp list; if long-running with many distinct IPs, the dict can grow large. Acceptable for MVP; for large scale, add TTL to clean whole IP keys or switch to Redis.Relation to LLM timeout
Rate limiting blocks “request count”, not “single LLM 90s”; a user sending 60 messages in 60 seconds could still slow down workers – needs product-level debounce plus timeout strategies from Article 7.CORS
*+ credentials
See Section V; before go-live, change to explicit origin list and remove the combination that conflicts with spec.
X. Summary
- Rate limiting in
api/middleware.py, CORS and registration order inmain.py; RateLimit registered later is outermost, blocking abusive requests first. - Sliding window +
/healthexemption + 429 JSON, sufficient for MVP; must switch to distributed counting before multi-instance. - Proxy IP, CORS with 429, path exemptions are three essential checks before going live.
- Next article (Article 17):
pydantic-settingsconfig management and API Key masking.
Appendix: Key Source Code (Line-by-Line Comments)
The following code is excerpted from iCan implementation, with Chinese comments above each line, allowing follow-along even without the public repository.
Generate command: python3 bin/build-ican-annotated-snippets.py
RateLimitMiddleware (excerpt)
1 | |
create_app CORS + Middleware Registration
1 | |
Series Navigation
| Article | Topic |
|---|---|
| 1 | System Overview |
| 2 | Five Agent Collaboration |
| 3 | Holland RIASEC |
| 4–7 | State · Routing · Nesting · Fault Tolerance |
| 8–11 | LLM Layer · SSE/WS · DB Migration · PDF |
| 12–14 | JSON Prompt · RIASEC Prompt · Guide Prompt |
| 15–17 | Docker · 16 Middleware (This Article) · Config |