🎯 Core Chapter Questions

How to safely and efficiently deploy a development environment application to production?

Challenge Traditional Pain Points Our Solution
Concurrency Synchronous blocking, CPU idle during I/O wait FastAPI + asyncio fully asynchronous architecture
Response Speed Querying database/calling LLM every time Three-level cache (Memory → Redis → MySQL)
Observability Hard to locate issues after problems occur Prometheus + Grafana + ELK full-link monitoring
Deployment Complexity Manual configuration, error-prone Docker Compose one-click deployment
Cost Control Uncontrollable LLM API call costs Cache + Token usage monitoring

📐 Architecture Overview

Deployment Architecture: Current Implementation vs Planned Expansion

Deployment Architecture: Current Implementation vs Planned Expansion

Production Deployment Topology

Production Deployment Topology (Planned)

Production Deployment Topology (Planned)


⚡ 1. Asynchronous Architecture Design

1.1 Why Asynchronous?

Synchronous vs Asynchronous Comparison:

1
2
3
4
5
6
7
# ❌ Synchronous blocking mode (Flask/Django)
@app.route('/api/chat')
def chat():
# Thread is blocked here, cannot handle other requests
data = db.query("SELECT ...") # Waiting for I/O: ~50ms
result = requests.post(LLM_API, ...) # Waiting for I/O: ~3000ms
return jsonify(result)
1
2
3
4
5
6
7
8
# ✅ Asynchronous non-blocking mode (FastAPI)
@router.post('/api/chat')
async def chat():
# Automatically yields CPU on await, processes other requests
data = await db.execute(select(...)) # Non-blocking: ~50ms
async with httpx.AsyncClient() as client:
result = await client.post(LLM_API, ...) # Non-blocking: ~3000ms
return JSONResponse(result)

Performance Comparison (4-core 8GB Server)

Scenario Synchronous (Gunicorn 4 workers) Asynchronous (Uvicorn) Improvement
Pure I/O intensive (calling LLM) 50 QPS 200 QPS 4x
Mixed load (DB+LLM+Redis) 120 QPS 450 QPS 3.75x
CPU intensive (data computation) 400 QPS 420 QPS 1.05x
Memory usage 800MB (4 processes) 250MB (single process) 3.2x savings

1.2 Core Asynchronous Components

app/main.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
from fastapi import FastAPI
from contextlib import asynccontextmanager
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker
import redis.asyncio as aioredis
import httpx

# Global async engine and session factory
engine = create_async_engine(
"mysql+aiomysql://user:pass@localhost/db",
pool_size=20, # Connection pool size
max_overflow=10, # Maximum overflow connections
pool_recycle=3600, # Connection recycle time (seconds)
echo=False, # Disable SQL logging in production
)
AsyncSessionLocal = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)

# Global Redis client
redis_client = aioredis.from_url(
"redis://localhost:6379/0",
decode_responses=True,
max_connections=20,
)


@asynccontextmanager
async def lifespan(app: FastAPI):
"""Application lifecycle management"""
print("🚀 Starting up...")

# On startup: warm up cache
await warmup_cache()

yield

print("🛑 Shutting down...")
# On shutdown: clean up resources
await engine.dispose()
await redis_client.close()


app = FastAPI(
title="NLP Data Analysis Platform",
version="1.0.0",
lifespan=lifespan, # Use the new lifecycle management
)


# Dependency injection: get database session
async def get_db() -> AsyncSession:
async with AsyncSessionLocal() as session:
try:
yield session
await session.commit()
except Exception:
await session.rollback()
raise


# Dependency injection: get Redis client
async def get_redis() -> aioredis.Redis:
return redis_client

1.3 Concurrent Query Example (Dashboard Data Fetching)

app/api/dashboards.py (detailed in Part 6)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import asyncio

@router.post("/{dashboard_id}/data")
async def get_dashboard_data(dashboard_id: int):
widgets = await load_widgets(dashboard_id)

async def fetch_one(widget):
"""Query a single widget's data (with caching logic)"""
cache_key = f"dash:{dashboard_id}:widget:{widget.id}"
cached = await redis_client.get(cache_key)

if cached:
return widget.id, json.loads(cached) # L2 hit!

data = await execute_query(widget.sql_query)
await redis_client.setex(cache_key, 60, json.dumps(data)) # Backfill L2
return widget.id, data

# 🔥 Core: use asyncio.gather for concurrent execution of all queries
tasks = [fetch_one(w) for w in widgets]
results = await asyncio.gather(*tasks, return_exceptions=True)

return dict(results)

💾 2. Multi-layer Caching Strategy

2.1 Three-Level Cache Architecture

Three-Level Cache Architecture (Planned)

Three-Level Cache Architecture (Planned)

2.2 Cache Implementation Code

app/services/cache_service.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
import json
import hashlib
from functools import lru_cache
from typing import Optional, Any
import redis.asyncio as aioredis

redis_client: aioredis.Redis = None


async def init_cache(redis_url: str = "redis://localhost:6379/0"):
global redis_client
redis_client = aioredis.from_url(redis_url, decode_responses=True)


def _generate_cache_key(prefix: str, *args, **kwargs) -> str:
"""
Generate a unique cache key

Example:
_generate_cache_key("sql", "SELECT * FROM users", datasource_id=1)
→ "sql:a1b2c3d4e5f6"
"""
raw = f"{prefix}:{str(args)}:{sorted(kwargs.items())}"
return f"{prefix}:{hashlib.md5(raw.encode()).hexdigest()[:12]}"


# ===== L1: In-process cache =====

@lru_cache(maxsize=128)
def get_semantic_model_l1(datasource_id: int) -> Optional[dict]:
"""
L1 cache: semantic model (data that rarely changes)

Note: lru_cache is synchronous, only used for pure computation or already loaded data.
For data that requires async retrieval, implement a manual dict cache.
"""
pass # Actually managed by the caller


# Global L1 dict cache (supports async)
_l1_cache: dict[str, tuple[Any, float]] = {} # {key: (value, timestamp)}


def get_from_l1(key: str, ttl: float = 300) -> Optional[Any]:
"""Read from L1 cache"""
if key in _l1_cache:
value, ts = _l1_cache[key]
if time.time() - ts < ttl:
return value
else:
del _l1_cache[key] # Delete expired entry
return None


def set_to_l1(key: str, value: Any):
"""Write to L1 cache"""
_l1_cache[key] = (value, time.time())


# ===== L2: Redis cache =====

async def get_from_l2(key: str) -> Optional[Any]:
"""Read from Redis"""
if not redis_client:
return None
try:
data = await redis_client.get(key)
if data:
return json.loads(data)
except Exception as e:
logger.warning(f"Redis GET failed: {e}")
return None


async def set_to_l2(key: str, value: Any, ttl: int = 60):
"""Write to Redis (with TTL)"""
if not redis_client:
return
try:
await redis_client.setex(key, ttl, json.dumps(value, default=str))
except Exception as e:
logger.warning(f"Redis SET failed: {e}")


async def invalidate_l2(pattern: str):
"""Batch clear matching cache keys (e.g., dash:*)"""
if not redis_client:
return
keys = await redis_client.keys(pattern)
if keys:
await redis_client.delete(*keys)


# ===== Unified cache interface =====

async def cached_get(
prefix: str,
fetch_func, # Async function for backfilling
l1_ttl: float = 300,
l2_ttl: int = 60,
*args,
**kwargs
) -> Any:
"""
Unified cache reading interface (L1 → L2 → L3 backfill)

Args:
prefix: Cache key prefix (e.g., "semantic_model", "query_result")
fetch_func: Callback function when cache misses (usually a DB query)
l1_ttl: L1 cache validity period (seconds)
l2_ttl: L2 cache validity period (seconds)
"""
cache_key = _generate_cache_key(prefix, *args, **kwargs)

# Try L1
result = get_from_l1(cache_key, l1_ttl)
if result is not None:
logger.debug(f"L1 cache hit: {cache_key}")
return result

# Try L2
result = await get_from_l2(cache_key)
if result is not None:
logger.debug(f"L2 cache hit: {cache_key}")
set_to_l1(cache_key, result) # Backfill L1
return result

# L3: Backfill (execute actual query)
logger.info(f"Cache miss, fetching from source: {cache_key}")
result = await fetch_func()

# Backfill L1 and L2
set_to_l1(cache_key, result)
await set_to_l2(cache_key, result, l2_ttl)

return result

2.3 Cache Strategy by Scenario

Data Type L1 TTL L2 TTL Invalidation Strategy
Semantic Model 10 minutes 30 minutes Manual refresh (when structure changes)
Dashboard Configuration 5 minutes 15 minutes Actively cleared when user edits
Widget Query Results No cache 60 seconds Auto-expire via TTL
LLM Dialog Responses No cache No cache Real-time generation (different each time)
User Information 10 minutes 30 minutes Cleared when user profile updates

📊 3. Monitoring and Alerting System

3.1 Prometheus Metric Collection

app/metrics.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
from prometheus_client import Counter, Histogram, Gauge, generate_latest
import time
from functools import wraps

# ===== Custom metric definitions =====

# Request counter (grouped by method, path, status code)
REQUEST_COUNT = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status_code']
)

# Request latency histogram (observing P50/P95/P99)
REQUEST_LATENCY = Histogram(
'request_duration_seconds',
'HTTP request latency in seconds',
['method', 'endpoint'],
buckets=[0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0],
)

# LLM Token usage
LLM_TOKENS_USED = Counter(
'llm_tokens_used_total',
'Total LLM tokens consumed',
['model', 'operation'] # operation: chat/completion/embedding
)

# SQL query latency
SQL_QUERY_LATENCY = Histogram(
'sql_query_duration_seconds',
'SQL query execution time',
['operation'], # select/insert/update
buckets=[0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 1.0],
)

# Cache hit rate
CACHE_HITS = Counter('cache_hits_total', 'Cache hits', ['level']) # level: l1/l2
CACHE_MISSES = Counter('cache_misses_total', 'Cache misses', ['level'])

# Current active connections
ACTIVE_CONNECTIONS = Gauge(
'active_connections',
'Currently active database connections'
)


# ===== Middleware: automatic metric collection =====

async def metrics_middleware(request, call_next):
"""FastAPI middleware: record metrics for each request"""
start_time = time.time()

response = await call_next(request)

duration = time.time() - start_time

# Record request count and latency
REQUEST_COUNT.labels(
method=request.method,
endpoint=request.url.path,
status_code=response.status_code
).inc()

REQUEST_LATENCY.labels(
method=request.method,
endpoint=request.url.path
).observe(duration)

return response


# ===== Decorator for specific functions =====

def track_llm_usage(operation: str):
"""Decorator to track LLM token usage"""
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
result = await func(*args, **kwargs)

# Assume return value contains token information
if hasattr(result, 'usage'):
LLM_TOKENS_USED.labels(
model=result.model or "unknown",
operation=operation
).inc(result.usage.total_tokens)

return result
return wrapper
return decorator


def track_sql_query(operation: str):
"""Decorator to track SQL execution time"""
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
start = time.time()
try:
result = await func(*args, **kwargs)
SQL_QUERY_LATENCY.labels(operation=operation).observe(time.time() - start)
return result
except Exception as e:
SQL_QUERY_LATENCY.labels(operation=operation).observe(time.time() - start)
raise
return wrapper
return decorator

3.2 Integration into FastAPI

app/main.py (continued)

1
2
3
4
5
6
7
8
9
10
11
from fastapi.middleware import Middleware
from starlette.middleware.base import BaseHTTPMiddleware
from prometheus_client import make_asgi_app

app = FastAPI(middleware=[
Middleware(BaseHTTPMiddleware, dispatch=metrics_middleware),
])

# Add /metrics endpoint (Prometheus scrapes metrics)
metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)

3.3 Alert Rules Configuration (alertmanager.yml)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
groups:
- name: api_alerts
rules:
# P1: High error rate
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status_code=~"5.."}[2m]))
/
sum(rate(http_requests_total[2m])) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "API error rate exceeds 5%"
description: "Error rate in the last 2 minutes: {{ $value | humanizePercentage }}"

# P1: P99 latency too high
- alert: HighLatencyP99
expr: |
histogram_quantile(0.99, rate(request_duration_seconds_bucket[5m])) > 5
for: 5m
labels:
severity: critical
annotations:
summary: "P99 latency exceeds 5 seconds"
description: "Current P99={{ $value }}s"

# P2: Average latency elevated
- alert: HighAvgLatency
expr: |
histogram_quantile(0.50, rate(request_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "P50 latency exceeds 1 second"

# P2: Low cache hit rate
- alert: LowCacheHitRate
expr: |
sum(rate(cache_hits_total[5m])) /
(sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) < 0.7
for: 10m
labels:
severity: warning
annotations:
summary: "Cache hit rate below 70%"
description: "Current hit rate={{ $value | humanizePercentage }}"

# P3: Resource usage alerts
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 0.85
for: 15m
labels:
severity: warning
annotations:
summary: "Memory usage exceeds 85%"

🐳 4. Docker Containerization Deployment

4.1 Dockerfile

Dockerfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Multi-stage build: reduce image size
FROM python:3.11-slim AS builder

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Production image
FROM python:3.11-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1

WORKDIR /app

# Copy dependencies from builder
COPY --from=builder /install /usr/local

# Copy application code
COPY . .

# Create non-root user for running
RUN adduser --disabled-password --gecos '' appuser && chown -R appuser:appuser /app
USER appuser

EXPOSE 8000

# Run with Uvicorn (4 Worker processes)
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

4.2 docker-compose.yml

docker-compose.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
version: '3.8'

services:
# Backend API service
backend:
build: ./backend
container_name: nlp-api
restart: unless-stopped
ports:
- "8000:8000"
environment:
- DATABASE_URL=mysql+aiomysql://root:${MYSQL_ROOT_PASSWORD}@mysql:3306/nlp_db
- REDIS_URL=redis://redis:6379/0
- DEEPSEEK_API_KEY=${DEEPSEEK_API_KEY}
- LOG_LEVEL=info
depends_on:
mysql:
condition: service_healthy
redis:
condition: service_healthy
networks:
- app-network
deploy:
resources:
limits:
cpus: '2.0'
memory: 2G
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3

# MySQL Database
mysql:
image: mysql:8.0
container_name: nlp-mysql
restart: unless-stopped
environment:
MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD:-rootpassword123}
MYSQL_DATABASE: nlp_db
MYSQL_CHARACTER_SET_SERVER: utf8mb4
MYSQL_COLLATION_SERVER: utf8mb4_unicode_ci
volumes:
- mysql_data:/var/lib/mysql
- ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
ports:
- "3306:3306"
command:
- --character-set-server=utf8mb4
- --collation-server=utf8mb4_unicode_ci
- --max-connections=200
- --innodb-buffer-pool-size=256M
networks:
- app-network
healthcheck:
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p${MYSQL_ROOT_PASSWORD}"]
interval: 10s
timeout: 5s
retries: 5

# Redis Cache
redis:
image: redis:7-alpine
container_name: nlp-redis
restart: unless-stopped
command: redis-server --maxmemory 512mb --maxmemory-policy allkeys-lru
volumes:
- redis_data:/data
ports:
- "6379:6379"
networks:
- app-network
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5

# Nginx Reverse Proxy
nginx:
image: nginx:alpine
container_name: nlp-nginx
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./ssl:/etc/nginx/ssl:ro
- ./frontend/dist:/usr/share/nginx/html:ro # Vue build output
depends_on:
- backend
networks:
- app-network

# Prometheus Metrics Collection
prometheus:
image: prom/prometheus:latest
container_name: nlp-prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
networks:
- app-network

# Grafana Visualization
grafana:
image: grafana/grafana:latest
container_name: nlp-grafana
restart: unless-stopped
ports:
- "3000:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD:-admin}
GF_USERS_ALLOW_SIGN_UP: "false"
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
depends_on:
- prometheus
networks:
- app-network

volumes:
mysql_data:
redis_data:
prometheus_data:
grafana_data:

networks:
app-network:
driver: bridge

4.3 Nginx Configuration

nginx.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
events {
worker_connections 1024;
}

http {
include mime.types;
default_type application/octet-stream;

# Log format (includes request ID for traceability)
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$request_id" '
'rt=$request_time';

access_log /var/log/nginx/access.log main;
error_log /var/log/nginx/error.log warn;

# Gzip compression
gzip on;
gzip_vary on;
gzip_min_length 1024;
gzip_types text/plain text/css application/json application/javascript text/xml;

# Rate Limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/m;
limit_req_zone $binary_remote_addr zone=llm_limit:10m rate=20r/m;

upstream backend {
server backend:8000;
}

server {
listen 80;
server_name your-domain.com;
return 301 https://$server_name$request_uri;
}

server {
listen 443 ssl http2;
server_name your-domain.com;

ssl_certificate /etc/nginx/ssl/fullchain.pem;
ssl_certificate_key /etc/nginx/ssl/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;

# Frontend static files
location / {
root /usr/share/nginx/html;
try_files $uri $uri/ /index.html; # Vue Router history mode
expires 1d;
add_header Cache-Control "public, immutable";
}

# API proxy
location /api/ {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Request-ID $request_id;

# General rate limiting
limit_req zone=api_limit burst=20 nodelay;

# WebSocket support (if needed)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

# Timeout settings
proxy_connect_timeout 10s;
proxy_read_timeout 60s; # LLM endpoints may be slower
proxy_send_timeout 60s;
}

# LLM-related endpoints with stricter rate limiting
location /api/chat/ {
proxy_pass http://backend;
limit_req zone=llm_limit burst=5 nodelay;
proxy_read_timeout 120s; # LLM may require more time
}

# Prometheus scrape endpoint (internal network only)
location /metrics {
allow 172.16.0.0/12; # Only allow internal network
deny all;
proxy_pass http://backend;
}
}
}

📝 5. Logging System (ELK Stack)

5.1 Structured Logging Configuration

logging.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import logging
import json
import sys
from pythonjsonlogger import jsonlogger

class CustomJsonFormatter(jsonlogger.JsonFormatter):
"""Custom JSON log formatter"""

def add_fields(self, log_record, record, message_dict):
super().add_fields(log_record, record, message_dict)

# Add extra fields
log_record['level'] = record.levelname
log_record['module'] = record.module
log_record['function'] = record.funcName
log_record['line'] = record.lineno


def setup_logging(level: str = "INFO"):
"""Initialize logging system"""
formatter = CustomJsonFormatter(
'%(timestamp)s %(level)s %(module)s %(message)s %(extra)s'
)

handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)

root_logger = logging.getLogger()
root_logger.setLevel(getattr(logging, level.upper()))
root_logger.addHandler(handler)

# Reduce log level for third-party libraries
logging.getLogger("uvicorn").setLevel(logging.WARNING)
logging.getLogger("sqlalchemy").setLevel(logging.WARNING)

5.2 Log Output Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"timestamp": "2026-05-22T14:30:22.123Z",
"level": "INFO",
"module": "chat_engine",
"message": "Chat request completed successfully",
"extra": {
"session_id": "sess_abc123",
"user_id": 42,
"duration_ms": 3250,
"tokens_used": 156,
"sql_executed": true,
"request_id": "req_xyz789"
}
}

🎯 6. Best Practices Summary and Performance Benchmarks

6.1 Production Environment Checklist

✅ Performance Optimization

  • Full-stack async/await (FastAPI + SQLAlchemy 2.0 + httpx + aioredis)
  • Three-level cache architecture (L1 memory → L2 Redis → L3 MySQL)
  • asyncio.gather() for concurrent queries (Dashboard scenarios)
  • Nginx Gzip compression + static resource caching
  • Database connection pool optimization (pool_size=20, max_overflow=10)

✅ Reliability Assurance

  • Docker Compose one-click deployment + restart: unless-stopped
  • Health Check endpoint (/health)
  • Graceful Shutdown (SIGTERM signal handling)
  • Database connection auto-recycle (pool_recycle=3600)
  • Redis max memory limit + LRU eviction policy

✅ Security Hardening

  • SQL injection prevention (validate_sql regex check)
  • Rate Limiting (Nginx layer: 100 req/min)
  • JWT authentication + CORS whitelist
  • HTTPS forced redirect (SSL/TLS 1.2+)
  • Sensitive information management via environment variables (.env file)

✅ Observability

  • Prometheus metric collection (request count/latency/Token/cache hit rate)
  • Grafana monitoring dashboards (3 dashboards)
  • AlertManager tiered alerts (P1/P2/P3)
  • ELK structured logging (JSON format + associated Request ID)
  • Distributed tracing (unique ID per request)

6.2 Performance Benchmark Results

Scenario P50 P95 P99 QPS (single instance) Notes
Health Check (/health) 2ms 5ms 12ms 5000+ Pure in-memory operation
Dashboard Load (with cache) 45ms 120ms 250ms 2000 L2 hit rate ~85%
Chat Dialog (calling DeepSeek) 2100ms 3800ms 5200 50 Bottleneck is LLM API
NL→SQL Conversion 2500ms 4200ms 6000 45 Includes semantic model loading
Metadata Analysis 3000ms 5500ms 8000 20 Large table scan scenario

Test Environment: 4-core 8GB Docker containers × 2 (total 8 cores, 16GB)

6.3 Monthly Cost Estimate (100 DAU Scenario)

Item Monthly Cost Description
Cloud Servers (2C4G × 2) ¥400 Alibaba Cloud/Tencent Cloud lightweight servers
LLM API Tokens ¥200-500 Depends on dialog frequency (DeepSeek pricing)
CDN + Bandwidth ¥100 Static resource distribution + API traffic
Domain + SSL Certificate ¥50 .com domain + Let’s Encrypt free SSL
Total ¥750 - ¥1,050 Affordable for small teams

🏆 7. Series Summary

Congratulations on finishing all 8 technical blog posts! Let’s review the core highlights of the entire project:

📚 Series Recap

Part Core Topic Key Technologies
Part 1 Project Overview & Technology Selection Vue3 + FastAPI + MySQL + Redis + ECharts
Part 2 LLM Unified Gateway Design LiteLLM abstraction layer, Prompt engineering, response cleaning
Part 3 Intelligent Metadata Management System LLM automatic table structure analysis, relationship inference algorithm
Part 4 NL→SQL Conversion Engine Semantic model injection, SQL generation, security validation
Part 5 Intelligent Dialogue Engine in Practice Multi-turn context management, chart recommendation, report generation
Part 6 Two-Stage Separation for Data Dashboards Design-time vs runtime decoupling, configuration-driven philosophy
Part 7 Frontend Drag-and-Drop Interaction System mousedown events, Grid layout, AABB collision detection
Part 8 Production Deployment & Performance Optimization Asynchronous architecture, three-level cache, Prometheus monitoring

🎯 Core Innovations of the Project

  1. Two-Stage Separation Architecture (Part 6): LLM used only at design time, zero AI cost at runtime
  2. Intelligent Chart Recommendation (Part 5): Automatically selects visualization based on SQL features
  3. Multi-turn Context Management (Part 5): Maintains 10-turn conversation history, understands references
  4. Full-stack Asynchronous Architecture (Part 8): async/await from frontend to backend to database
  5. Three-level Cache System (Part 8): L1→L2→L3, hit rate 90%+
  6. Declarative UI System (Parts 6-7): JSON configuration-driven, supports free drag-and-drop layout

💡 Applicable Scenarios

Suitable for:

  • Internal enterprise data analysis platforms
  • Self-service BI tools for non-technical users
  • SaaS product prototypes requiring NL→SQL capability
  • Rapid data visualization MVP building

⚠️ Caveats:

  • LLM-generated SQL may not be perfectly accurate (require manual review for critical queries)
  • Complex analysis scenarios still require professional analysts
  • Data-sensitive scenarios need private deployment of LLM

🚀 Next Steps Recommendations

If you’re interested in this project, we suggest following this order:

  1. Local Startup: Backend uvicorn + frontend npm run dev, first get the main flow working
  2. Read Part 4 to understand the core NL→SQL conversion process
  3. Try Custom Prompts (Part 2), adapting to your business scenarios
  4. Extend Widget Types (Parts 6-7), add more chart components
  5. Integrate Other LLMs (Part 2), such as GPT-4, Claude, etc.
  6. Deploy to Production (Part 8), experience the full DevOps workflow

Author: DevCfg.cn

Date: May 2026

License: MIT License

Thanks for reading! Feel free to discuss in the comments if you have any questions 🎉