📖 Introduction: Why This Project?
In daily enterprise operations, data querying is one of the most frequent needs, but the reality is harsh:
- Business users: Want to query some data? They have to ask IT to write SQL and wait in line for hours.
- IT engineers: Constantly interrupted by “Can you run a report for me?”, leaving no time for more valuable work.
- Data analysts: Familiar with SQL but not with the table structures of every business domain — they have to run
DESCRIBEevery time.
What if users could query the database directly using natural language?
That’s exactly the core mission of the NLP MySQL Smart Data Analysis Platform — enabling non-technical users to ask questions in Chinese, the system automatically translates them into SQL, and returns visual results.
🎯 Core Features at a Glance
1. Smart Chat
1 | |
Highlights:
- ✅ Multi-turn conversation context support (“Break down the above data by region”)
- ✅ Automatic chart type recommendation (Line/Bar/Pie/Metric Card)
- ✅ One-click generation of analysis reports
2. Data Dashboard
- 🎨 Drag-and-drop editor: Build dashboards like building blocks
- ⚡ Two-stage separation architecture: LLM generates SQL at design time; pure SQL execution at runtime (performance improved 10x)
- 🔥 One-click quick generation: Enter a single sentence (“Create a sales overview dashboard for me”) — LLM automatically layouts and configures components
3. Metadata Management
- 🤖 LLM-powered intelligent parsing: Automatically assign Chinese names to tables/columns, infer table relationships
- 🔗 Visual relationship editing: Graphically maintain foreign key relationships
- 📊 Semantic model construction: Convert database structures into JSON descriptions that LLM can understand
4. Formula System
- 📝 Predefined parameterized SQL templates (e.g., “YoY/QoQ calculation”, “TOP N ranking”)
- ♻️ Reusable in dashboards and chat
🏗️ Technology Stack Selection & Rationale
Backend Stack
| Technology | Version | Why This? |
|---|---|---|
| FastAPI | 0.100+ | ⚡ Native async support + auto-generated OpenAPI docs + Pydantic integration |
| SQLAlchemy 2.0 (async) | 2.0+ | 🔀 Supports async/await, avoids blocking the event loop; ORM + Core dual mode |
| Pydantic v2 | 2.0+ | ✅ Data validation performance improved 5x+; flexible Strict/Loose modes |
| LiteLLM | 1.0+ | 🌐 Unified invocation of DeepSeek/OpenAI/Claude and other LLMs; switch models with one line of code |
| aiomysql | 0.2+ | 🚀 Async driver for MySQL, works with SQLAlchemy async |
| Uvicorn | 0.24+ | ⚡ ASGI server supporting HTTP/2 and WebSocket |
💡 Key Decision 1: Why FastAPI Instead of Django/Flask?
1 | |
Reasons:
- Native async/await: Our core flow involves LLM API calls (I/O bound) + database queries (I/O bound). Async significantly improves concurrency.
- Auto-generated docs: Frontend colleagues can directly use Swagger UI to debug endpoints.
- Deep Pydantic integration: Strongly-typed request/response models reduce parameter validation code by 80%.
💡 Key Decision 2: Dual Database Architecture (SQLite + MySQL)
1 | |
Why separate?
- SQLite (lightweight): Application metadata is small (hundreds of records), no need for an extra MySQL service — easier for development and testing.
- MySQL (target): User business data can be large (millions of rows), requiring a professional relational database. Also supports multiple data sources — users can configure multiple MySQL instances.
Frontend Stack
| Technology | Version | Core Value |
|---|---|---|
| Vue 3 | 3.4+ | Composition API better for complex state management; strong TypeScript support |
| TypeScript | 5.0+ | Type safety reduces runtime errors |
| Vite | 5.0+ | ⚡ Extremely fast HMR (<50ms), great development experience |
| Element Plus | 2.4+ | Enterprise-level UI component library — forms, tables, dialogs out of the box |
| ECharts (vue-echarts) | 6.0+ | 📊 Most powerful charting library, supporting 20+ chart types |
| Pinia | 2.1+ | Vue 3 official recommended state management, simpler than Vuex |
💡 Key Decision 3: Why Not Use a Low-Code Platform?
There are many BI tools on the market (e.g., Metabase, Superset), but they have these pain points:
| Pain Point | Traditional BI Tools | Our Solution |
|---|---|---|
| Natural language querying | ❌ Manual field dragging required | ✅ Query directly in Chinese |
| Multi-data source associations | ⚠️ Complex configuration | ✅ LLM automatically infers table relationships |
| Customization degree | ❌ Limited by platform templates | ✅ Fully customizable Vue components |
| Deployment cost | ❌ Requires JVM / big data cluster | ✅ Single-machine Docker is enough |
| LLM integration | ❌ Most don’t support it | ✅ Deep integration with DeepSeek/OpenAI |
LLM Technology Selection
| Dimension | Choice | Reason |
|---|---|---|
| Main model | DeepSeek-V3 | 🇨🇳 Excellent Chinese understanding; price only 1/10 of GPT-4 |
| Invocation layer | LiteLLM | 🌐 Unified interface, can seamlessly switch to OpenAI/Claude in the future |
| Prompt engineering | System Prompt + Few-shot | 📝 Structured output control (forced JSON format) |
💡 Key Decision 4: Why LiteLLM Instead of Direct OpenAI SDK?
1 | |
Advantages:
- Zero-cost model switching: Just modify
LLM_MODEL=deepseek-chattoLLM_MODEL=gpt-4in.env. - Unified exception handling: Error codes from different LLMs are standardized.
- Built-in retry mechanism: Automatically retries on network flakiness (configurable).
📐 Overall System Architecture
NLP MySQL Smart Data Analysis Platform System Architecture
Architecture Layer Explanation
Layer 1: Presentation Layer
- Tech Stack: Vue 3 + TypeScript + Vite + Element Plus + ECharts
- Core Modules:
- Smart chat interface (multi-turn conversation + result display)
- Data dashboard editor (drag & drop + Grid layout)
- Metadata management interface (table/column/relationship CRUD)
- Chart component library (LineChart / BarChart / PieChart / MetricCard)
Layer 2: API Gateway Layer
- Tech Stack: FastAPI Router
- Responsibilities:
- RESTful route distribution (
/api/chat/*,/api/dashboards/*, …) - Request parameter validation (Pydantic Schema)
- Unified error handling (HTTP Exception Handler)
- RESTful route distribution (
Layer 3: Business Logic Layer
- 7 Core Services:
| Service | File | Responsibility |
|---|---|---|
| LLM Gateway | llm_gateway.py |
Unified LLM invocation, Prompt management, response cleaning |
| SQL Generator | sql_generator.py |
NL→SQL conversion, security verification, query execution |
| Chat Engine | chat_engine.py |
Dialog context management, chart recommendation, report generation |
| Dashboard Engine | dashboard_engine.py |
Dashboard CRUD, NL→SQL parsing, rendering logic |
| Metadata Service | metadata.py |
Metadata CRUD, LLM intelligent analysis, semantic model construction |
| Formula Engine | formula.py |
Parameterized SQL template execution |
| Datasource Service | datasource.py |
Data source connection management, password encryption |
Layer 4: Data Layer
- SQLite (aiosqlite): Application metadata (sessions, dashboard configs, metadata, formula definitions)
- MySQL (aiomysql): Target business databases (actual data users query)
- LLM API: DeepSeek / OpenAI cloud services
- Environment Config (.env): API keys, database connection strings, and other sensitive info
🎬 Typical User Scenario Demo
Scenario 1: Business User Queries Sales Data
User asks in Chinese, system returns SQL and bar chart result
Scenario 2: Data Analyst Builds a Monitoring Dashboard
LLM automatically lays out components; LLM is no longer called during preview
📊 Project Data Scale (Reference)
| Metric | Value | Description |
|---|---|---|
| Lines of code | ~5,000 | Backend Python ~3,000 lines + Frontend TS/Vue ~2,000 lines |
| API endpoints | 25+ | Covering Chat / Dashboard / Metadata / Datasource / Formula |
| Database tables | 8 | SQLite application database (dashboards, widgets, sessions, …) |
| Supported chart types | 5 | Line chart, bar chart, pie chart, metric card, table |
| LLM invocation scenarios | 4 | NL→SQL, metadata analysis, dashboard generation, report writing |
| Single query time | 2~5 sec | Includes LLM call ( |
| Dashboard render time | <200ms | Pure SQL execution + ECharts rendering (no LLM call) |
🚀 Quick Start
Prerequisites
- Python 3.10+
- Node.js 18+
- MySQL 8.0+ (target database)
- DeepSeek API Key (or OpenAI API Key)
1️⃣ Start Backend
1 | |
2️⃣ Start Frontend
1 | |
3️⃣ Import Test Data
1 | |
Visit http://localhost:5173 and start exploring!
🎯 Series Preview
This series includes 8 in-depth technical blog posts, covering everything from architecture design to production deployment:
| Post | Core Topic | Target Audience | Est. Reading Time |
|---|---|---|---|
| ✅ #1 | Project Overview & Tech Stack Selection | Everyone | 15 min |
| #2 | LLM Unified Gateway Design | Backend Engineers | 20 min |
| #3 | Intelligent Metadata Management System | Data Engineers | 25 min |
| #4 | NL→SQL Conversion Engine | AI Application Developers | 30 min |
| #5 | Smart Chat Engine in Practice | Full-Stack Developers | 25 min |
| #6 | Data Dashboard Two-Stage Separation Architecture | Architects | 30 min |
| #7 | Frontend Drag-and-Drop Interaction System | Frontend Engineers | 25 min |
| #8 | Production Deployment & Performance Optimization | DevOps/IT Ops | 20 min |
💡 Summary & Reflections
Through this project, we solved three core problems:
- Lowered the barrier to data querying: From “must know SQL” to “just speak Chinese”
- Boosted dashboard development efficiency: From “hand-code everything” to “describe in natural language + one-click generation”
- Achieved deep integration of LLM with traditional systems: Not a simple API call, but a two-stage separation architecture ensuring production stability
In the next article, we’ll dive deep into the LLM Gateway design — how to achieve unified multi-model invocation via LiteLLM, how to design Prompt templates to ensure output quality, and how to clean noisy data from LLM responses.
Stay tuned! 🚀
Author: DevCfg.cn
Copyright: This article may be freely republished with attribution.