🎯 Core Chapter Questions
How to safely and efficiently deploy a development environment application to production?
| Challenge | Traditional Pain Points | Our Solution |
|---|---|---|
| Concurrency | Synchronous blocking, CPU idle during I/O wait | FastAPI + asyncio fully asynchronous architecture |
| Response Speed | Querying database/calling LLM every time | Three-level cache (Memory → Redis → MySQL) |
| Observability | Hard to locate issues after problems occur | Prometheus + Grafana + ELK full-link monitoring |
| Deployment Complexity | Manual configuration, error-prone | Docker Compose one-click deployment |
| Cost Control | Uncontrollable LLM API call costs | Cache + Token usage monitoring |
📐 Architecture Overview
Deployment Architecture: Current Implementation vs Planned Expansion
Production Deployment Topology
Production Deployment Topology (Planned)
⚡ 1. Asynchronous Architecture Design
1.1 Why Asynchronous?
Synchronous vs Asynchronous Comparison:
1 | |
1 | |
Performance Comparison (4-core 8GB Server)
| Scenario | Synchronous (Gunicorn 4 workers) | Asynchronous (Uvicorn) | Improvement |
|---|---|---|---|
| Pure I/O intensive (calling LLM) | 50 QPS | 200 QPS | 4x |
| Mixed load (DB+LLM+Redis) | 120 QPS | 450 QPS | 3.75x |
| CPU intensive (data computation) | 400 QPS | 420 QPS | 1.05x |
| Memory usage | 800MB (4 processes) | 250MB (single process) | 3.2x savings |
1.2 Core Asynchronous Components
app/main.py
1 | |
1.3 Concurrent Query Example (Dashboard Data Fetching)
app/api/dashboards.py (detailed in Part 6)
1 | |
💾 2. Multi-layer Caching Strategy
2.1 Three-Level Cache Architecture
Three-Level Cache Architecture (Planned)
2.2 Cache Implementation Code
app/services/cache_service.py
1 | |
2.3 Cache Strategy by Scenario
| Data Type | L1 TTL | L2 TTL | Invalidation Strategy |
|---|---|---|---|
| Semantic Model | 10 minutes | 30 minutes | Manual refresh (when structure changes) |
| Dashboard Configuration | 5 minutes | 15 minutes | Actively cleared when user edits |
| Widget Query Results | No cache | 60 seconds | Auto-expire via TTL |
| LLM Dialog Responses | No cache | No cache | Real-time generation (different each time) |
| User Information | 10 minutes | 30 minutes | Cleared when user profile updates |
📊 3. Monitoring and Alerting System
3.1 Prometheus Metric Collection
app/metrics.py
1 | |
3.2 Integration into FastAPI
app/main.py (continued)
1 | |
3.3 Alert Rules Configuration (alertmanager.yml)
1 | |
🐳 4. Docker Containerization Deployment
4.1 Dockerfile
Dockerfile
1 | |
4.2 docker-compose.yml
1 | |
4.3 Nginx Configuration
1 | |
📝 5. Logging System (ELK Stack)
5.1 Structured Logging Configuration
logging.conf
1 | |
5.2 Log Output Example
1 | |
🎯 6. Best Practices Summary and Performance Benchmarks
6.1 Production Environment Checklist
✅ Performance Optimization
- Full-stack async/await (FastAPI + SQLAlchemy 2.0 + httpx + aioredis)
- Three-level cache architecture (L1 memory → L2 Redis → L3 MySQL)
- asyncio.gather() for concurrent queries (Dashboard scenarios)
- Nginx Gzip compression + static resource caching
- Database connection pool optimization (pool_size=20, max_overflow=10)
✅ Reliability Assurance
- Docker Compose one-click deployment + restart: unless-stopped
- Health Check endpoint (/health)
- Graceful Shutdown (SIGTERM signal handling)
- Database connection auto-recycle (pool_recycle=3600)
- Redis max memory limit + LRU eviction policy
✅ Security Hardening
- SQL injection prevention (validate_sql regex check)
- Rate Limiting (Nginx layer: 100 req/min)
- JWT authentication + CORS whitelist
- HTTPS forced redirect (SSL/TLS 1.2+)
- Sensitive information management via environment variables (.env file)
✅ Observability
- Prometheus metric collection (request count/latency/Token/cache hit rate)
- Grafana monitoring dashboards (3 dashboards)
- AlertManager tiered alerts (P1/P2/P3)
- ELK structured logging (JSON format + associated Request ID)
- Distributed tracing (unique ID per request)
6.2 Performance Benchmark Results
| Scenario | P50 | P95 | P99 | QPS (single instance) | Notes |
|---|---|---|---|---|---|
Health Check (/health) |
2ms | 5ms | 12ms | 5000+ | Pure in-memory operation |
| Dashboard Load (with cache) | 45ms | 120ms | 250ms | 2000 | L2 hit rate ~85% |
| Chat Dialog (calling DeepSeek) | 2100ms | 3800ms | 5200 | 50 | Bottleneck is LLM API |
| NL→SQL Conversion | 2500ms | 4200ms | 6000 | 45 | Includes semantic model loading |
| Metadata Analysis | 3000ms | 5500ms | 8000 | 20 | Large table scan scenario |
Test Environment: 4-core 8GB Docker containers × 2 (total 8 cores, 16GB)
6.3 Monthly Cost Estimate (100 DAU Scenario)
| Item | Monthly Cost | Description |
|---|---|---|
| Cloud Servers (2C4G × 2) | ¥400 | Alibaba Cloud/Tencent Cloud lightweight servers |
| LLM API Tokens | ¥200-500 | Depends on dialog frequency (DeepSeek pricing) |
| CDN + Bandwidth | ¥100 | Static resource distribution + API traffic |
| Domain + SSL Certificate | ¥50 | .com domain + Let’s Encrypt free SSL |
| Total | ¥750 - ¥1,050 | Affordable for small teams |
🏆 7. Series Summary
Congratulations on finishing all 8 technical blog posts! Let’s review the core highlights of the entire project:
📚 Series Recap
| Part | Core Topic | Key Technologies |
|---|---|---|
| Part 1 | Project Overview & Technology Selection | Vue3 + FastAPI + MySQL + Redis + ECharts |
| Part 2 | LLM Unified Gateway Design | LiteLLM abstraction layer, Prompt engineering, response cleaning |
| Part 3 | Intelligent Metadata Management System | LLM automatic table structure analysis, relationship inference algorithm |
| Part 4 | NL→SQL Conversion Engine | Semantic model injection, SQL generation, security validation |
| Part 5 | Intelligent Dialogue Engine in Practice | Multi-turn context management, chart recommendation, report generation |
| Part 6 | Two-Stage Separation for Data Dashboards | Design-time vs runtime decoupling, configuration-driven philosophy |
| Part 7 | Frontend Drag-and-Drop Interaction System | mousedown events, Grid layout, AABB collision detection |
| Part 8 | Production Deployment & Performance Optimization | Asynchronous architecture, three-level cache, Prometheus monitoring |
🎯 Core Innovations of the Project
- Two-Stage Separation Architecture (Part 6): LLM used only at design time, zero AI cost at runtime
- Intelligent Chart Recommendation (Part 5): Automatically selects visualization based on SQL features
- Multi-turn Context Management (Part 5): Maintains 10-turn conversation history, understands references
- Full-stack Asynchronous Architecture (Part 8): async/await from frontend to backend to database
- Three-level Cache System (Part 8): L1→L2→L3, hit rate 90%+
- Declarative UI System (Parts 6-7): JSON configuration-driven, supports free drag-and-drop layout
💡 Applicable Scenarios
✅ Suitable for:
- Internal enterprise data analysis platforms
- Self-service BI tools for non-technical users
- SaaS product prototypes requiring NL→SQL capability
- Rapid data visualization MVP building
⚠️ Caveats:
- LLM-generated SQL may not be perfectly accurate (require manual review for critical queries)
- Complex analysis scenarios still require professional analysts
- Data-sensitive scenarios need private deployment of LLM
🚀 Next Steps Recommendations
If you’re interested in this project, we suggest following this order:
- Local Startup: Backend
uvicorn+ frontendnpm run dev, first get the main flow working - Read Part 4 to understand the core NL→SQL conversion process
- Try Custom Prompts (Part 2), adapting to your business scenarios
- Extend Widget Types (Parts 6-7), add more chart components
- Integrate Other LLMs (Part 2), such as GPT-4, Claude, etc.
- Deploy to Production (Part 8), experience the full DevOps workflow
Author: DevCfg.cn
Date: May 2026
License: MIT License
Thanks for reading! Feel free to discuss in the comments if you have any questions 🎉