📖 Introduction: Why This Project?

In daily enterprise operations, data querying is one of the most frequent needs, but the reality is harsh:

  • Business users: Want to query some data? They have to ask IT to write SQL and wait in line for hours.
  • IT engineers: Constantly interrupted by “Can you run a report for me?”, leaving no time for more valuable work.
  • Data analysts: Familiar with SQL but not with the table structures of every business domain — they have to run DESCRIBE every time.

What if users could query the database directly using natural language?

That’s exactly the core mission of the NLP MySQL Smart Data Analysis Platform — enabling non-technical users to ask questions in Chinese, the system automatically translates them into SQL, and returns visual results.


🎯 Core Features at a Glance

1. Smart Chat

1
2
3
User: "Query the sales amount for each category in the last 7 days."
System:
① Understand semantics → ② Generate SQL → ③ Execute query → ④ Return result + chart

Highlights:

  • ✅ Multi-turn conversation context support (“Break down the above data by region”)
  • ✅ Automatic chart type recommendation (Line/Bar/Pie/Metric Card)
  • ✅ One-click generation of analysis reports

2. Data Dashboard

  • 🎨 Drag-and-drop editor: Build dashboards like building blocks
  • Two-stage separation architecture: LLM generates SQL at design time; pure SQL execution at runtime (performance improved 10x)
  • 🔥 One-click quick generation: Enter a single sentence (“Create a sales overview dashboard for me”) — LLM automatically layouts and configures components

3. Metadata Management

  • 🤖 LLM-powered intelligent parsing: Automatically assign Chinese names to tables/columns, infer table relationships
  • 🔗 Visual relationship editing: Graphically maintain foreign key relationships
  • 📊 Semantic model construction: Convert database structures into JSON descriptions that LLM can understand

4. Formula System

  • 📝 Predefined parameterized SQL templates (e.g., “YoY/QoQ calculation”, “TOP N ranking”)
  • ♻️ Reusable in dashboards and chat

🏗️ Technology Stack Selection & Rationale

Backend Stack

Technology Version Why This?
FastAPI 0.100+ ⚡ Native async support + auto-generated OpenAPI docs + Pydantic integration
SQLAlchemy 2.0 (async) 2.0+ 🔀 Supports async/await, avoids blocking the event loop; ORM + Core dual mode
Pydantic v2 2.0+ ✅ Data validation performance improved 5x+; flexible Strict/Loose modes
LiteLLM 1.0+ 🌐 Unified invocation of DeepSeek/OpenAI/Claude and other LLMs; switch models with one line of code
aiomysql 0.2+ 🚀 Async driver for MySQL, works with SQLAlchemy async
Uvicorn 0.24+ ⚡ ASGI server supporting HTTP/2 and WebSocket

💡 Key Decision 1: Why FastAPI Instead of Django/Flask?

1
2
3
4
5
6
# FastAPI async route example
@router.post("/api/chat/{session_id}/message")
async def chat(session_id: int, msg: ChatMessageSend, db: AsyncSession = Depends(get_db)):
# The entire request chain is asynchronous!
result = await chat_engine.chat(db, session_id, msg.message)
return result

Reasons:

  1. Native async/await: Our core flow involves LLM API calls (I/O bound) + database queries (I/O bound). Async significantly improves concurrency.
  2. Auto-generated docs: Frontend colleagues can directly use Swagger UI to debug endpoints.
  3. Deep Pydantic integration: Strongly-typed request/response models reduce parameter validation code by 80%.

💡 Key Decision 2: Dual Database Architecture (SQLite + MySQL)

1
2
3
4
5
6
7
8
# database.py — Two async engines
# SQLite: Store application metadata (sessions, dashboard configs, metadata, etc.)
sqlite_engine = create_async_engine("sqlite+aiosqlite:///./app.db")

# MySQL: User-configured target business databases (dynamically connected at runtime)
async def get_mysql_connection(datasource_id: int):
ds = await get_datasource(datasource_id)
return await aiomysql.connect(host=ds.host, port=ds.port, ...)

Why separate?

  • SQLite (lightweight): Application metadata is small (hundreds of records), no need for an extra MySQL service — easier for development and testing.
  • MySQL (target): User business data can be large (millions of rows), requiring a professional relational database. Also supports multiple data sources — users can configure multiple MySQL instances.

Frontend Stack

Technology Version Core Value
Vue 3 3.4+ Composition API better for complex state management; strong TypeScript support
TypeScript 5.0+ Type safety reduces runtime errors
Vite 5.0+ ⚡ Extremely fast HMR (<50ms), great development experience
Element Plus 2.4+ Enterprise-level UI component library — forms, tables, dialogs out of the box
ECharts (vue-echarts) 6.0+ 📊 Most powerful charting library, supporting 20+ chart types
Pinia 2.1+ Vue 3 official recommended state management, simpler than Vuex

💡 Key Decision 3: Why Not Use a Low-Code Platform?

There are many BI tools on the market (e.g., Metabase, Superset), but they have these pain points:

Pain Point Traditional BI Tools Our Solution
Natural language querying ❌ Manual field dragging required ✅ Query directly in Chinese
Multi-data source associations ⚠️ Complex configuration ✅ LLM automatically infers table relationships
Customization degree ❌ Limited by platform templates ✅ Fully customizable Vue components
Deployment cost ❌ Requires JVM / big data cluster ✅ Single-machine Docker is enough
LLM integration ❌ Most don’t support it ✅ Deep integration with DeepSeek/OpenAI

LLM Technology Selection

Dimension Choice Reason
Main model DeepSeek-V3 🇨🇳 Excellent Chinese understanding; price only 1/10 of GPT-4
Invocation layer LiteLLM 🌐 Unified interface, can seamlessly switch to OpenAI/Claude in the future
Prompt engineering System Prompt + Few-shot 📝 Structured output control (forced JSON format)

💡 Key Decision 4: Why LiteLLM Instead of Direct OpenAI SDK?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# llm_gateway.py — Unified gateway design
class LLMGateway:
def _get_model_name(self, model=None):
if "deepseek" in self.model.lower():
return f"deepseek/{self.model}" # DeepSeek format
return f"openai/{self.model}" # OpenAI format

async def generate(self, prompt, system_prompt=None):
response = await litellm.acompletion(
model=self._get_model_name(),
messages=[...],
api_key=self.api_key,
api_base=self.api_base,
)
return response.choices[0].message.content

Advantages:

  1. Zero-cost model switching: Just modify LLM_MODEL=deepseek-chat to LLM_MODEL=gpt-4 in .env.
  2. Unified exception handling: Error codes from different LLMs are standardized.
  3. Built-in retry mechanism: Automatically retries on network flakiness (configurable).

📐 Overall System Architecture

NLP MySQL Smart Data Analysis Platform System Architecture

NLP MySQL Smart Data Analysis Platform System Architecture

Architecture Layer Explanation

Layer 1: Presentation Layer

  • Tech Stack: Vue 3 + TypeScript + Vite + Element Plus + ECharts
  • Core Modules:
    • Smart chat interface (multi-turn conversation + result display)
    • Data dashboard editor (drag & drop + Grid layout)
    • Metadata management interface (table/column/relationship CRUD)
    • Chart component library (LineChart / BarChart / PieChart / MetricCard)

Layer 2: API Gateway Layer

  • Tech Stack: FastAPI Router
  • Responsibilities:
    • RESTful route distribution (/api/chat/*, /api/dashboards/*, …)
    • Request parameter validation (Pydantic Schema)
    • Unified error handling (HTTP Exception Handler)

Layer 3: Business Logic Layer

  • 7 Core Services:
Service File Responsibility
LLM Gateway llm_gateway.py Unified LLM invocation, Prompt management, response cleaning
SQL Generator sql_generator.py NL→SQL conversion, security verification, query execution
Chat Engine chat_engine.py Dialog context management, chart recommendation, report generation
Dashboard Engine dashboard_engine.py Dashboard CRUD, NL→SQL parsing, rendering logic
Metadata Service metadata.py Metadata CRUD, LLM intelligent analysis, semantic model construction
Formula Engine formula.py Parameterized SQL template execution
Datasource Service datasource.py Data source connection management, password encryption

Layer 4: Data Layer

  • SQLite (aiosqlite): Application metadata (sessions, dashboard configs, metadata, formula definitions)
  • MySQL (aiomysql): Target business databases (actual data users query)
  • LLM API: DeepSeek / OpenAI cloud services
  • Environment Config (.env): API keys, database connection strings, and other sensitive info

🎬 Typical User Scenario Demo

Scenario 1: Business User Queries Sales Data

Scenario 1: Business user queries sales data in natural language

User asks in Chinese, system returns SQL and bar chart result

Scenario 2: Data Analyst Builds a Monitoring Dashboard

Scenario 2: One-click generation of operations overview dashboard

LLM automatically lays out components; LLM is no longer called during preview


📊 Project Data Scale (Reference)

Metric Value Description
Lines of code ~5,000 Backend Python ~3,000 lines + Frontend TS/Vue ~2,000 lines
API endpoints 25+ Covering Chat / Dashboard / Metadata / Datasource / Formula
Database tables 8 SQLite application database (dashboards, widgets, sessions, …)
Supported chart types 5 Line chart, bar chart, pie chart, metric card, table
LLM invocation scenarios 4 NL→SQL, metadata analysis, dashboard generation, report writing
Single query time 2~5 sec Includes LLM call (2s) + SQL execution (0.1s)
Dashboard render time <200ms Pure SQL execution + ECharts rendering (no LLM call)

🚀 Quick Start

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • MySQL 8.0+ (target database)
  • DeepSeek API Key (or OpenAI API Key)

1️⃣ Start Backend

1
2
3
4
5
6
7
8
9
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Edit .env and fill in your LLM_API_KEY

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

2️⃣ Start Frontend

1
2
3
cd frontend
npm install
npm run dev

3️⃣ Import Test Data

1
mysql -u root -p your_database < test_data.sql

Visit http://localhost:5173 and start exploring!


🎯 Series Preview

This series includes 8 in-depth technical blog posts, covering everything from architecture design to production deployment:

Post Core Topic Target Audience Est. Reading Time
✅ #1 Project Overview & Tech Stack Selection Everyone 15 min
#2 LLM Unified Gateway Design Backend Engineers 20 min
#3 Intelligent Metadata Management System Data Engineers 25 min
#4 NL→SQL Conversion Engine AI Application Developers 30 min
#5 Smart Chat Engine in Practice Full-Stack Developers 25 min
#6 Data Dashboard Two-Stage Separation Architecture Architects 30 min
#7 Frontend Drag-and-Drop Interaction System Frontend Engineers 25 min
#8 Production Deployment & Performance Optimization DevOps/IT Ops 20 min

💡 Summary & Reflections

Through this project, we solved three core problems:

  1. Lowered the barrier to data querying: From “must know SQL” to “just speak Chinese”
  2. Boosted dashboard development efficiency: From “hand-code everything” to “describe in natural language + one-click generation”
  3. Achieved deep integration of LLM with traditional systems: Not a simple API call, but a two-stage separation architecture ensuring production stability

In the next article, we’ll dive deep into the LLM Gateway design — how to achieve unified multi-model invocation via LiteLLM, how to design Prompt templates to ensure output quality, and how to clean noisy data from LLM responses.

Stay tuned! 🚀


Author: DevCfg.cn
Copyright: This article may be freely republished with attribution.