0. Series Loop (Follow Along Even Without Public Source Code)

End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (Parse→Analyze→Match→Report) → tools/pdf_exporter PDF.
This Article: 16/17 · Gateway Loop · Middleware

Stage User Visible Code Entry Article
Create Session Welcome Message POST /api/sessions 09
Multi-turn Dialogue SSE Streaming chat/stream → run_guide_single_turn 06, 14
Info Complete Start Analysis _run_analysis_background 05, 07
Resume Parsing Progress 30% run_resume_parser 12
Profile/RIASEC Progress 50% run_profile_analyzer 03, 13
Career Matching Progress 70% run_career_matcher 02
Report Progress 90% run_reporter 11
Download PDF File GET …/report/pdf 11, 15
Description
Before Reading This Article 01 main.py
After Reading This Can state the registration order of CORS & RateLimit and /health whitelist
Next Loop Article 17: Configuration Sources (Article 17)

Full series loop index: SERIES-LOOP.md

I. What Problem to Solve

iCan’s /chat/stream, report generation, and file uploads all trigger LLM or long-running tasks. During public demos or internal testing, without IP rate limiting, a single client hammering the interface can lead to:

  • LLM quota drained quickly (cost and rate double hit);
  • Uvicorn workers saturated, causing SSE stream stuttering for legitimate users;
  • Logs flooded by the same IP, making it hard to diagnose real faults.

The project implements a custom sliding window IP rate limiter in api/middleware.py (without introducing slowapi/Redis, suitable for MVP size), registered alongside CORSMiddleware in main.py‘s create_app().

II. Implementation Location

Component File Description
Rate Limiting api/middleware.py RateLimitMiddleware
CORS + Registration Order main.py Two add_middleware calls inside create_app()
Affected Routes api/routes/chat.py etc. All HTTP requests go through middleware chain

FastAPI Request Processing Chain

III. Starlette Middleware Order (Onion Model)

In FastAPI/Starlette, middleware added later encounters the request first. The actual registration order in main.py:

1
2
3
4
5
6
7
8
9
10
11
# main.py — create_app()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)

from ican.api.middleware import RateLimitMiddleware
app.add_middleware(RateLimitMiddleware, max_requests=60, window_seconds=60)

Thus the request path is:

1
Client → RateLimitMiddleware → CORSMiddleware → Route Handler → CORSMiddleware → RateLimitMiddleware → Client

RateLimit is the outermost layer: when exceeded, it returns 429 directly in dispatch, without running CORS logic and routing – suitable for blocking abusive requests. If the order were reversed, CORS would be processed before rate limiting, wasting CPU.

IV. RateLimitMiddleware Implementation

Full logic in api/middleware.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class RateLimitMiddleware(BaseHTTPMiddleware):
def __init__(self, app, max_requests: int = 60, window_seconds: int = 60):
super().__init__(app)
self.max_requests = max_requests
self.window_seconds = window_seconds
self._requests = defaultdict(list)

async def dispatch(self, request: Request, call_next):
if request.url.path == "/health":
return await call_next(request)

client_ip = request.client.host if request.client else "unknown"
now = time.time()
self._requests[client_ip] = [
t for t in self._requests[client_ip]
if now - t < self.window_seconds
]

if len(self._requests[client_ip]) >= self.max_requests:
return JSONResponse(
status_code=429,
content={
"error": "请求过于频繁",
"detail": f"每{self.window_seconds}秒最多{self.max_requests}次请求",
},
)

self._requests[client_ip].append(now)
return await call_next(request)

Design choices:

Point Description
Sliding Window Filter expired records by timestamp, smoother than “reset every minute”
In-memory defaultdict(list) Simple implementation; counters reset on process restart; no sharing across replicas
/health exemption Docker/K8s healthcheck not blocked by 429
429 JSON body Frontend can show unified toast, fields error + detail

Default: 60 requests / 60 seconds / IP, hardcoded in main.py; not yet pulled from config.py settings – quota changes require code modification or later extraction to config (Article 17).

V. CORSMiddleware Configuration

MVP stage allows all origins for easy testing with local Vue dev server and production domain:

1
2
3
4
5
6
7
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)

Note the combination of allow_origins=["*"] and allow_credentials=True: Under browser spec, cross-origin requests with cookies cannot use Access-Control-Allow-Origin: *. In development, without credentials, this may be fine; in production with login cookies, allow_origins must be changed to an explicit domain list (e.g., https://app.example.com).

iCan’s current JWT/session uses headers rather than cookies, so impact is minimal, but before going live, the frontend’s actual credential method should be reviewed.

VI. Coordination with Business Routes

  • POST /chat/stream (api/routes/chat.py): One user message usually counts as 1 rate limit; if the frontend mistakenly reconnects SSE frequently, it will quickly hit 429.
  • WebSocket (api/routes/ws.py): The handshake is an HTTP upgrade, also passes through RateLimitMiddleware, counted in the same IP window.
  • POST resume upload: Large body still counts as one request; currently no separate lower quota for uploads or rate limiting by user_id.
  • /health: Not rate limited, ensuring orchestrator health checks work reliably.

Extension idea (not implemented): Branch in dispatch by request.url.path, e.g., chat 30/min, upload 10/min, parameters from config.py.

VII. 429 Response and CORS Headers

When rate limited, a bare JSONResponse(429) is returned. Since it does not go through the CORSMiddleware path that adds headers to successful responses, in some cross-origin scenarios the browser may show only a network error, unable to read the JSON body.

Given the current order (RateLimit outermost), the 429 is returned directly by RateLimit, potentially missing CORS headers. For same-origin internal deployments, this is usually fine; if frontend and API are on different origins and need to parse the 429 body, you can manually add Access-Control-Allow-Origin in the 429 branch, or adjust middleware order and test browser behavior.

VIII. Production Evolution Path

Stage Solution
MVP (current) Single-process in-memory sliding window, 60/min/IP
Multiple Uvicorn workers / multiple Pods Each replica independent counter, effective limit ≈ N × 60; need Redis INCR+TTL or gateway rate limiting
Logged-in users Rate limit by user_id, reduce false positives for multiple users behind NAT sharing IP
Edge Static resources via CDN, API through WAF/cloud vendor rate limits

IX. Pitfalls and Edge Cases

  1. request.client is None or proxy IP wrong
    Code uses request.client.host; if Nginx doesn’t pass X-Forwarded-For, all requests behind proxy may show 127.0.0.1 or "unknown", causing all users to share one bucket. In production, configure real IP in reverse proxy, and read the first segment of X-Forwarded-For in middleware (current code doesn’t do this – known gap).

  2. /health path hardcoded
    If health check changes to /api/health or with a prefix, the exemption condition must be updated accordingly; otherwise, probes get 429, causing containers to restart repeatedly.

  3. Memory leak boundary
    _requests only cleans the current IP’s timestamp list; if long-running with many distinct IPs, the dict can grow large. Acceptable for MVP; for large scale, add TTL to clean whole IP keys or switch to Redis.

  4. Relation to LLM timeout
    Rate limiting blocks “request count”, not “single LLM 90s”; a user sending 60 messages in 60 seconds could still slow down workers – needs product-level debounce plus timeout strategies from Article 7.

  5. CORS * + credentials
    See Section V; before go-live, change to explicit origin list and remove the combination that conflicts with spec.

X. Summary

  • Rate limiting in api/middleware.py, CORS and registration order in main.py; RateLimit registered later is outermost, blocking abusive requests first.
  • Sliding window + /health exemption + 429 JSON, sufficient for MVP; must switch to distributed counting before multi-instance.
  • Proxy IP, CORS with 429, path exemptions are three essential checks before going live.
  • Next article (Article 17): pydantic-settings config management and API Key masking.

Appendix: Key Source Code (Line-by-Line Comments)

The following code is excerpted from iCan implementation, with Chinese comments above each line, allowing follow-along even without the public repository.
Generate command: python3 bin/build-ican-annotated-snippets.py

RateLimitMiddleware (excerpt)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# ========== RateLimitMiddleware (excerpt) ==========
# Source file: api/middleware.py lines 1-60

# L2: [Document] File description: API middleware
# L3: [Document] Business description: Provides request-level common middleware, such as IP rate limiting
# L4: [Document] Data flow: HTTP request -> middleware intercept -> route handling -> HTTP response
# (L1-5 are function/module docstrings, converted to comments for readability)

# L7: Import dependency module
import time
# L8: Import dependency module
from collections import defaultdict

# L10: Import dependency module
from starlette.middleware.base import BaseHTTPMiddleware
# L11: Import dependency module
from starlette.requests import Request
# L12: Import dependency module
from starlette.responses import JSONResponse


# L15: Define class (config or ORM model)
class RateLimitMiddleware(BaseHTTPMiddleware):
# L17: [Document] IP rate limiting middleware
# L19: [Document] Function description:
# L20: [Document] Rate limits requests based on client IP address to prevent malicious frequent API calls.
# L21: [Document] Supports configurable max requests within a time window; health check endpoints are exempt.
# L23: [Document] Input parameters:
# L24: [Document] app: ASGI application instance
# L25: [Document] max_requests (int): max requests within time window, default 60
# L26: [Document] window_seconds (int): time window in seconds, default 60
# L28: [Document] Output parameters:
# L29: [Document] Normal requests pass through to downstream processing
# L30: [Document] Exceeded requests return 429 status code
# (L16-31 are function/module docstrings, converted to comments for readability)

# L33: Synchronous function __init__: routing decision or factory method
def __init__(self, app, max_requests: int = 60, window_seconds: int = 60):
# L34: Execute this statement (details see above business description)
super().__init__(app)
# L35: Assignment: update local variable or state field
self.max_requests = max_requests
# L36: Assignment: update local variable or state field
self.window_seconds = window_seconds
# L37: Assignment: update local variable or state field
self._requests = defaultdict(list)

# L39: Async function dispatch: can be awaited, suitable for IO-bound LLM/DB calls
async def dispatch(self, request: Request, call_next):
# L40: Health check not rate limited
# L41: Condition branch
if request.url.path == "/health":
# L42: Return fields to merge into state (LangGraph will merge)
return await call_next(request)

# L44: Assignment: update local variable or state field
client_ip = request.client.host if request.client else "unknown"
# L45: Assignment: update local variable or state field
now = time.time()

# L47: Clean expired records
# L48: Assignment: update local variable or state field
self._requests[client_ip] = [
# L49: Execute this statement (details see above business description)
t for t in self._requests[client_ip]
# L50: Condition branch
if now - t < self.window_seconds
# L51: Execute this statement (details see above business description)
]

# L53: Condition branch
if len(self._requests[client_ip]) >= self.max_requests:
# L54: Return fields to merge into state (LangGraph will merge)
return JSONResponse(
# L55: Assignment: update local variable or state field
status_code=429,
# L56: Assignment: update local variable or state field
content={"error": "请求过于频繁", "detail": f"每{self.window_seconds}秒最多{self.max_requests}次请求"}
# L57: Execute this statement (details see above business description)
)

# L59: Execute this statement (details see above business description)
self._requests[client_ip].append(now)
# L60: Return fields to merge into state (LangGraph will merge)
return await call_next(request)

create_app CORS + Middleware Registration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# ========== create_app CORS + Middleware Registration ==========
# Source file: main.py lines 64-115

# L64: Synchronous function create_app: routing decision or factory method
def create_app() -> FastAPI:
# L66: [Document] Create FastAPI application instance
# L68: [Document] Function description:
# L69: [Document] Creates and configures a FastAPI application instance, including:
# L70: [Document] 1. Set application title, description, and version info
# L71: [Document] 2. Configure CORS middleware to allow frontend cross-origin access
# L72: [Document] 3. Register all API route modules (chat, report, upload)
# L73: [Document] 4. Bind lifecycle manager
# L75: [Document] Input parameters:
# L76: [Document] None
# L78: [Document] Output parameters:
# L79: [Document] FastAPI: Configured FastAPI application instance, ready for uvicorn startup
# (L65-80 are function/module docstrings, converted to comments for readability)
# L81: Start try block, subsequent except handles fallback
try:
# L82: Log, for online debugging node input/output
logger.info("[create_app] Starting to create FastAPI application instance")

# L84: Assignment: update local variable or state field
app = FastAPI(
# L85: Assignment: update local variable or state field
title=settings.APP_NAME,
# L86: Assignment: update local variable or state field
description="iCan - Intelligent Career Planning AI Agent System",
# L87: Assignment: update local variable or state field
version=settings.APP_VERSION,
# L88: Assignment: update local variable or state field
lifespan=lifespan,
# L89: Execute this statement (details see above business description)
)

# L91: Configure CORS middleware
# L92: Execute this statement (details see above business description)
app.add_middleware(
# L93: Execute this statement (details see above business description)
CORSMiddleware,
# L94: Assignment: update local variable or state field
allow_origins=["*"], # MVP stage allows all origins; production needs restriction
# L95: Assignment: update local variable or state field
allow_credentials=True,
# L96: Assignment: update local variable or state field
allow_methods=["*"],
# L97: Assignment: update local variable or state field
allow_headers=["*"],
# L98: Execute this statement (details see above business description)
)
# L99: Log, for online debugging node input/output
logger.info("[create_app] CORS middleware configured")

# L101: Configure rate limiting middleware
# L102: Import dependency module
from ican.api.middleware import RateLimitMiddleware
# L103: Assignment: update local variable or state field
app.add_middleware(RateLimitMiddleware, max_requests=60, window_seconds=60)
# L104: Log, for online debugging node input/output
logger.info("[create_app] Rate limiting middleware configured")

# L106: Root route redirect
# L107: Decorator
@app.get("/", include_in_schema=False)
# L108: Async function root: can be awaited, suitable for IO-bound LLM/DB calls
async def root():
# L109: Return fields to merge into state (LangGraph will merge)
return RedirectResponse(url="/static/index.html")

# L111: Register routes
# L112: Execute this statement (details see above business description)
app.include_router(auth.router)
# L113: Execute this statement (details see above business description)
app.include_router(ws.router)
# L114: Execute this statement (details see above business description)
app.include_router(chat.router)
# L115: Execute this statement (details see above business description)
app.include_router(report.router)

Series Navigation

Article Topic
1 System Overview
2 Five Agent Collaboration
3 Holland RIASEC
4–7 State · Routing · Nesting · Fault Tolerance
8–11 LLM Layer · SSE/WS · DB Migration · PDF
12–14 JSON Prompt · RIASEC Prompt · Guide Prompt
15–17 Docker · 16 Middleware (This Article) · Config

← Back to iCan Special Topic