FastAPI Middleware Design: Custom RateLimitMiddleware and CORS Configuration Practice

0. Series Loop (Follow Along Even Without Public Source Code)

End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (Parse→Analyze→Match→Report) → tools/pdf_exporter PDF.
This Article: 16/17 · Gateway Loop · Middleware

Stage	User Visible	Code Entry	Article
Create Session	Welcome Message	POST /api/sessions	09
Multi-turn Dialogue	SSE Streaming	chat/stream → run_guide_single_turn	06, 14
Info Complete	Start Analysis	_run_analysis_background	05, 07
Resume Parsing	Progress 30%	run_resume_parser	12
Profile/RIASEC	Progress 50%	run_profile_analyzer	03, 13
Career Matching	Progress 70%	run_career_matcher	02
Report	Progress 90%	run_reporter	11
Download PDF	File	GET …/report/pdf	11, 15

	Description
Before Reading This	Article 01 main.py
After Reading This	Can state the registration order of CORS & RateLimit and /health whitelist
Next Loop	Article 17: Configuration Sources (Article 17)

Full series loop index: SERIES-LOOP.md

I. What Problem to Solve

iCan’s /chat/stream, report generation, and file uploads all trigger LLM or long-running tasks. During public demos or internal testing, without IP rate limiting, a single client hammering the interface can lead to:

LLM quota drained quickly (cost and rate double hit);
Uvicorn workers saturated, causing SSE stream stuttering for legitimate users;
Logs flooded by the same IP, making it hard to diagnose real faults.

The project implements a custom sliding window IP rate limiter in api/middleware.py (without introducing slowapi/Redis, suitable for MVP size), registered alongside CORSMiddleware in main.py‘s create_app().

II. Implementation Location

Component	File	Description
Rate Limiting	`api/middleware.py`	`RateLimitMiddleware`
CORS + Registration Order	`main.py`	Two `add_middleware` calls inside `create_app()`
Affected Routes	`api/routes/chat.py` etc.	All HTTP requests go through middleware chain

FastAPI Request Processing Chain

III. Starlette Middleware Order (Onion Model)

In FastAPI/Starlette, middleware added later encounters the request first. The actual registration order in main.py:

# main.py — create_app()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

from ican.api.middleware import RateLimitMiddleware
app.add_middleware(RateLimitMiddleware, max_requests=60, window_seconds=60)

Thus the request path is:

1	`Client → RateLimitMiddleware → CORSMiddleware → Route Handler → CORSMiddleware → RateLimitMiddleware → Client`

RateLimit is the outermost layer: when exceeded, it returns 429 directly in dispatch, without running CORS logic and routing – suitable for blocking abusive requests. If the order were reversed, CORS would be processed before rate limiting, wasting CPU.

IV. `RateLimitMiddleware` Implementation

Full logic in api/middleware.py:

class RateLimitMiddleware(BaseHTTPMiddleware):
    def __init__(self, app, max_requests: int = 60, window_seconds: int = 60):
        super().__init__(app)
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self._requests = defaultdict(list)

    async def dispatch(self, request: Request, call_next):
        if request.url.path == "/health":
            return await call_next(request)

        client_ip = request.client.host if request.client else "unknown"
        now = time.time()
        self._requests[client_ip] = [
            t for t in self._requests[client_ip]
            if now - t < self.window_seconds
        ]

        if len(self._requests[client_ip]) >= self.max_requests:
            return JSONResponse(
                status_code=429,
                content={
                    "error": "请求过于频繁",
                    "detail": f"每{self.window_seconds}秒最多{self.max_requests}次请求",
                },
            )

        self._requests[client_ip].append(now)
        return await call_next(request)

Design choices:

Point	Description
Sliding Window	Filter expired records by timestamp, smoother than “reset every minute”
In-memory `defaultdict(list)`	Simple implementation; counters reset on process restart; no sharing across replicas
`/health` exemption	Docker/K8s healthcheck not blocked by 429
429 JSON body	Frontend can show unified toast, fields `error` + `detail`

Default: 60 requests / 60 seconds / IP, hardcoded in main.py; not yet pulled from config.py settings – quota changes require code modification or later extraction to config (Article 17).

V. CORSMiddleware Configuration

MVP stage allows all origins for easy testing with local Vue dev server and production domain:

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Note the combination of allow_origins=["*"] and allow_credentials=True: Under browser spec, cross-origin requests with cookies cannot use Access-Control-Allow-Origin: *. In development, without credentials, this may be fine; in production with login cookies, allow_origins must be changed to an explicit domain list (e.g., https://app.example.com).

iCan’s current JWT/session uses headers rather than cookies, so impact is minimal, but before going live, the frontend’s actual credential method should be reviewed.

VI. Coordination with Business Routes

POST /chat/stream (api/routes/chat.py): One user message usually counts as 1 rate limit; if the frontend mistakenly reconnects SSE frequently, it will quickly hit 429.
WebSocket (api/routes/ws.py): The handshake is an HTTP upgrade, also passes through RateLimitMiddleware, counted in the same IP window.
POST resume upload: Large body still counts as one request; currently no separate lower quota for uploads or rate limiting by user_id.
/health: Not rate limited, ensuring orchestrator health checks work reliably.

Extension idea (not implemented): Branch in dispatch by request.url.path, e.g., chat 30/min, upload 10/min, parameters from config.py.

VII. 429 Response and CORS Headers

When rate limited, a bare JSONResponse(429) is returned. Since it does not go through the CORSMiddleware path that adds headers to successful responses, in some cross-origin scenarios the browser may show only a network error, unable to read the JSON body.

Given the current order (RateLimit outermost), the 429 is returned directly by RateLimit, potentially missing CORS headers. For same-origin internal deployments, this is usually fine; if frontend and API are on different origins and need to parse the 429 body, you can manually add Access-Control-Allow-Origin in the 429 branch, or adjust middleware order and test browser behavior.

VIII. Production Evolution Path

Stage	Solution
MVP (current)	Single-process in-memory sliding window, 60/min/IP
Multiple Uvicorn workers / multiple Pods	Each replica independent counter, effective limit ≈ N × 60; need Redis INCR+TTL or gateway rate limiting
Logged-in users	Rate limit by `user_id`, reduce false positives for multiple users behind NAT sharing IP
Edge	Static resources via CDN, API through WAF/cloud vendor rate limits

IX. Pitfalls and Edge Cases

request.client is None or proxy IP wrong
Code uses request.client.host; if Nginx doesn’t pass X-Forwarded-For, all requests behind proxy may show 127.0.0.1 or "unknown", causing all users to share one bucket. In production, configure real IP in reverse proxy, and read the first segment of X-Forwarded-For in middleware (current code doesn’t do this – known gap).
/health path hardcoded
If health check changes to /api/health or with a prefix, the exemption condition must be updated accordingly; otherwise, probes get 429, causing containers to restart repeatedly.
Memory leak boundary
_requests only cleans the current IP’s timestamp list; if long-running with many distinct IPs, the dict can grow large. Acceptable for MVP; for large scale, add TTL to clean whole IP keys or switch to Redis.
Relation to LLM timeout
Rate limiting blocks “request count”, not “single LLM 90s”; a user sending 60 messages in 60 seconds could still slow down workers – needs product-level debounce plus timeout strategies from Article 7.
CORS * + credentials
See Section V; before go-live, change to explicit origin list and remove the combination that conflicts with spec.

X. Summary

Rate limiting in api/middleware.py, CORS and registration order in main.py; RateLimit registered later is outermost, blocking abusive requests first.
Sliding window + /health exemption + 429 JSON, sufficient for MVP; must switch to distributed counting before multi-instance.
Proxy IP, CORS with 429, path exemptions are three essential checks before going live.
Next article (Article 17): pydantic-settings config management and API Key masking.

Appendix: Key Source Code (Line-by-Line Comments)

The following code is excerpted from iCan implementation, with Chinese comments above each line, allowing follow-along even without the public repository.
Generate command: python3 bin/build-ican-annotated-snippets.py

RateLimitMiddleware (excerpt)

# ========== RateLimitMiddleware (excerpt) ==========
# Source file: api/middleware.py   lines 1-60

# L2: [Document] File description: API middleware
# L3: [Document] Business description: Provides request-level common middleware, such as IP rate limiting
# L4: [Document] Data flow: HTTP request -> middleware intercept -> route handling -> HTTP response
# (L1-5 are function/module docstrings, converted to comments for readability)

# L7: Import dependency module
import time
# L8: Import dependency module
from collections import defaultdict

# L10: Import dependency module
from starlette.middleware.base import BaseHTTPMiddleware
# L11: Import dependency module
from starlette.requests import Request
# L12: Import dependency module
from starlette.responses import JSONResponse


# L15: Define class (config or ORM model)
class RateLimitMiddleware(BaseHTTPMiddleware):
# L17: [Document] IP rate limiting middleware
# L19: [Document] Function description:
# L20: [Document] Rate limits requests based on client IP address to prevent malicious frequent API calls.
# L21: [Document] Supports configurable max requests within a time window; health check endpoints are exempt.
# L23: [Document] Input parameters:
# L24: [Document] app: ASGI application instance
# L25: [Document] max_requests (int): max requests within time window, default 60
# L26: [Document] window_seconds (int): time window in seconds, default 60
# L28: [Document] Output parameters:
# L29: [Document] Normal requests pass through to downstream processing
# L30: [Document] Exceeded requests return 429 status code
# (L16-31 are function/module docstrings, converted to comments for readability)

# L33: Synchronous function __init__: routing decision or factory method
    def __init__(self, app, max_requests: int = 60, window_seconds: int = 60):
# L34: Execute this statement (details see above business description)
        super().__init__(app)
# L35: Assignment: update local variable or state field
        self.max_requests = max_requests
# L36: Assignment: update local variable or state field
        self.window_seconds = window_seconds
# L37: Assignment: update local variable or state field
        self._requests = defaultdict(list)

# L39: Async function dispatch: can be awaited, suitable for IO-bound LLM/DB calls
    async def dispatch(self, request: Request, call_next):
# L40: Health check not rate limited
# L41: Condition branch
        if request.url.path == "/health":
# L42: Return fields to merge into state (LangGraph will merge)
            return await call_next(request)

# L44: Assignment: update local variable or state field
        client_ip = request.client.host if request.client else "unknown"
# L45: Assignment: update local variable or state field
        now = time.time()

# L47: Clean expired records
# L48: Assignment: update local variable or state field
        self._requests[client_ip] = [
# L49: Execute this statement (details see above business description)
            t for t in self._requests[client_ip]
# L50: Condition branch
            if now - t < self.window_seconds
# L51: Execute this statement (details see above business description)
        ]

# L53: Condition branch
        if len(self._requests[client_ip]) >= self.max_requests:
# L54: Return fields to merge into state (LangGraph will merge)
            return JSONResponse(
# L55: Assignment: update local variable or state field
                status_code=429,
# L56: Assignment: update local variable or state field
                content={"error": "请求过于频繁", "detail": f"每{self.window_seconds}秒最多{self.max_requests}次请求"}
# L57: Execute this statement (details see above business description)
            )

# L59: Execute this statement (details see above business description)
        self._requests[client_ip].append(now)
# L60: Return fields to merge into state (LangGraph will merge)
        return await call_next(request)

create_app CORS + Middleware Registration

# ========== create_app CORS + Middleware Registration ==========
# Source file: main.py   lines 64-115

# L64: Synchronous function create_app: routing decision or factory method
def create_app() -> FastAPI:
# L66: [Document] Create FastAPI application instance
# L68: [Document] Function description:
# L69: [Document] Creates and configures a FastAPI application instance, including:
# L70: [Document] 1. Set application title, description, and version info
# L71: [Document] 2. Configure CORS middleware to allow frontend cross-origin access
# L72: [Document] 3. Register all API route modules (chat, report, upload)
# L73: [Document] 4. Bind lifecycle manager
# L75: [Document] Input parameters:
# L76: [Document] None
# L78: [Document] Output parameters:
# L79: [Document] FastAPI: Configured FastAPI application instance, ready for uvicorn startup
# (L65-80 are function/module docstrings, converted to comments for readability)
# L81: Start try block, subsequent except handles fallback
    try:
# L82: Log, for online debugging node input/output
        logger.info("[create_app] Starting to create FastAPI application instance")

# L84: Assignment: update local variable or state field
        app = FastAPI(
# L85: Assignment: update local variable or state field
            title=settings.APP_NAME,
# L86: Assignment: update local variable or state field
            description="iCan - Intelligent Career Planning AI Agent System",
# L87: Assignment: update local variable or state field
            version=settings.APP_VERSION,
# L88: Assignment: update local variable or state field
            lifespan=lifespan,
# L89: Execute this statement (details see above business description)
        )

# L91: Configure CORS middleware
# L92: Execute this statement (details see above business description)
        app.add_middleware(
# L93: Execute this statement (details see above business description)
            CORSMiddleware,
# L94: Assignment: update local variable or state field
            allow_origins=["*"],  # MVP stage allows all origins; production needs restriction
# L95: Assignment: update local variable or state field
            allow_credentials=True,
# L96: Assignment: update local variable or state field
            allow_methods=["*"],
# L97: Assignment: update local variable or state field
            allow_headers=["*"],
# L98: Execute this statement (details see above business description)
        )
# L99: Log, for online debugging node input/output
        logger.info("[create_app] CORS middleware configured")

# L101: Configure rate limiting middleware
# L102: Import dependency module
        from ican.api.middleware import RateLimitMiddleware
# L103: Assignment: update local variable or state field
        app.add_middleware(RateLimitMiddleware, max_requests=60, window_seconds=60)
# L104: Log, for online debugging node input/output
        logger.info("[create_app] Rate limiting middleware configured")

# L106: Root route redirect
# L107: Decorator
        @app.get("/", include_in_schema=False)
# L108: Async function root: can be awaited, suitable for IO-bound LLM/DB calls
        async def root():
# L109: Return fields to merge into state (LangGraph will merge)
            return RedirectResponse(url="/static/index.html")

# L111: Register routes
# L112: Execute this statement (details see above business description)
        app.include_router(auth.router)
# L113: Execute this statement (details see above business description)
        app.include_router(ws.router)
# L114: Execute this statement (details see above business description)
        app.include_router(chat.router)
# L115: Execute this statement (details see above business description)
        app.include_router(report.router)

Article	Topic
1	System Overview
2	Five Agent Collaboration
3	Holland RIASEC
4–7	State · Routing · Nesting · Fault Tolerance
8–11	LLM Layer · SSE/WS · DB Migration · PDF
12–14	JSON Prompt · RIASEC Prompt · Guide Prompt
15–17	Docker · 16 Middleware (This Article) · Config

← Back to iCan Special Topic