In AI application development, how to elegantly call LLM APIs? Hard-coding with requests.post()? Or writing a set of calling logic for each business module?
Specific challenges we face:
❌ Model lock-in risk: Use DeepSeek today, want to switch to GPT-4 tomorrow, need to modify dozens of code locations
❌ Unstable output: LLMs occasionally return code wrapped in ````sql, ` tags, or extra explanatory text
❌ Token limit crashes: Semantic model too large triggers context_length_exceeded error
❌ Difficult debugging: Hard to know what Prompt the LLM received, what it returned, and how long it took
Solution: Build a unified LLM Gateway to converge all LLM interactions into one module.
🏗️ Architecture Overview
LLM Unified Gateway Architecture
Design Principles
Single Responsibility, Open/Closed Principle, Observability & Defensive Programming
def_get_model_name(self, model=None): """Format model name to fit LiteLLM""" if model: return model if"deepseek"inself.model.lower(): returnf"deepseek/{self.model}"# → "deepseek/deepseek-chat" returnf"openai/{self.model}"# → "openai/gpt-4o"
💡 Key Design: Model Name Formatting
LiteLLM requires provider/model format:
1 2 3 4 5 6 7 8 9 10
# DeepSeek format "deepseek/deepseek-chat"# Corresponds to DeepSeek-V3 "deepseek/deepseek-coder"# Corresponds to DeepSeek-Coder
# OpenAI format "openai/gpt-4o" "openai/gpt-4-turbo"
# Custom endpoint (private deployments compatible with OpenAI API) "openai/my-custom-model"# Used with api_base
Cost of switching model = modifying 1 line in .env config 🎉
📝 2. Prompt Engineering Practices
2.1 System Prompt Template Library
Our system has 4 different LLM calling scenarios, each requiring a carefully designed System Prompt:
Scenario 1: SQL Generation (most complex)
app/services/llm_gateway.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
asyncdefgenerate_sql(self, natural_language, semantic_model): system_prompt = """You are a professional SQL generation expert. Based on the user's natural language query and the database semantic model, generate valid MySQL SQL query statements.
Rules: 1. Output only the SQL statement, do not output any explanatory text. 2. Use the correct table names and field names from the semantic model. 3. When related queries are needed, use JOIN based on table relationships. 4. Add WHERE, GROUP BY, ORDER BY clauses as needed. 5. If the user does not explicitly specify sorting, default to descending order by primary key or time field."""
asyncdefanalyze_metadata(self, tables_info): system_prompt = """You are an experienced database analyst. Please analyze the given database table structures, and complete the following tasks: 1. Give each table an easy-to-understand Chinese name (business name) 2. Write a Chinese business description for each table 3. Give each field a Chinese business name ...
Return the following JSON structure: { "tables": { "table_name": { "business_name": "Chinese business name", ... } }, "suggested_relations": [ { "from_table": "source table name", "relation_type": "one_to_one or one_to_many", "confidence": "high or medium or low", ... } ] }
Important: Output only JSON, do not output any other text."""
Design Points:
✅ Forced JSON output: Easy for program parsing
✅ Confidence field: Mark the confidence of relationship inference (high/medium/low)
✅ Example-driven: Guide output format through complete JSON template
2.2 Prompt Injection Techniques
Variable Templating
1 2 3 4 5 6 7 8 9 10
# ❌ Bad practice: String concatenation error-prone prompt = "Database has these tables:" + str(tables) + "User asks:" + question
For complex tasks, append examples to the User Prompt:
1 2 3 4 5 6 7 8 9 10 11 12 13
user_prompt += """
Example: User input: "Query the top 10 products by sales amount" Output: SELECT p.name, SUM(oi.amount) as total_sales FROM products p JOIN order_items oi ON p.id = oi.product_id GROUP BY p.id ORDER BY total_sales DESC LIMIT 10;
Now generate SQL according to the above rules:"""
🔒 3. Length Control Mechanism
Problem Background
When a database has many tables (50+) and many fields (200+), the semantic model JSON may exceed 80K characters, directly causing LLM API errors:
1
Error: This model's maximum context length is 65536 tokens...
# ⚡ Key: Length check and truncation total_chars = sum(len(m["content"]) for m in messages) if total_chars > 80000: logger.warning(f"[LLM Request] prompt too long ({total_chars} chars), truncating") for m in messages: iflen(m["content"]) > 60000: m["content"] = m["content"][:60000] + "\n... (content truncated due to length)"
Keep the first part (usually contains most important info)
Truncation hint
Append “… (content truncated due to length)”
Let LLM know data is incomplete
💡 Why truncate instead of compress?
Approach
Pros
Cons
Truncation (current)
Simple to implement; preserves original data format
May lose important trailing info
Summarization (LLM first summarize)
Retains key information
Adds an extra LLM call; doubles latency
Chunk + multiple calls
No data loss
Complex; needs result merging
Our choice: For most scenarios (<30 tables), truncation won’t be triggered; even if triggered, the first 60K chars usually cover the core table structure.
🧹 4. Response Cleaner
Problem Background
Different LLMs have vastly different output styles:
LLM
Typical Output Example
DeepSeek-V3
\n\n<think>\nUser wants to query...\n</think>\n\n\``sql\nSELECT …\n```\n\nOf course, you can also…`
GPT-4o
Here's the SQL query:\n\``sql\nSELECT …\n```\n\nThis query will…`
Claude
Certainly! Here's the query:\n\``\nSELECT …\n```\n\nLet me explain…`
Common feature: All contain noise (thought process, code block markers, explanatory text)
# Extract JSON (may be wrapped in ```json) json_match = re.search(r'\{[\s\S]*\}', result) if json_match: result = json_match.group(0)
# Strip code block markers if result.startswith("```json"): result = result[7:] if result.startswith("```"): result = result[3:] if result.endswith("```"): result = result[:-3]
# Try parsing and validation try: parsed = json.loads(result.strip()) logger.info(f"[analyze_metadata] parsed successfully, tables={len(parsed.get('tables', {}))}") return parsed except json.JSONDecodeError as e: logger.error(f"[analyze_metadata] JSON parse failed: {e}") return {} # Return empty dict instead of crashing, ensuring system stability
INFO [LLM Request] model=deepseek/deepseek-chat, api_base=https://api.deepseek.com, total_chars=45230 INFO [LLM Request] system_prompt (first 500): You are a professional SQL generation expert. Based on the user's natural language query and the database semantic model, generate valid MySQL SQL query statements. Rules: 1. Output only the SQL statement, do not output any explanatory text. ... INFO [LLM Request] user_prompt (first 1000 chars): Database Semantic Model: { "tables": [ { "table_name": "orders", "fields": [...] } ] } User's natural language query: Query the total order amount for the last 7 days ... INFO [LLM Response] elapsed=2.34s, length=187 INFO [LLM Response Content] (first 2000 chars): ```sql SELECT DATE(created_at) as date, SUM(total_amount) as total FROM orders WHERE created_at >= DATE_SUB(CURDATE(), INTERVAL 7 DAY) GROUP BY DATE(created_at) ORDER BY date DESC;
\n\n<think> The user wants to query "sales ranking of each category last month", need to: 1. Filter time range (last month) 2. Group by category 3. Calculate total sales 4. Sort by total sales descending Involved tables: orders, order_items, products, categories ...</think>
```sql SELECT c.name AS category_name, SUM(oi.amount) AS total_sales FROM categories c JOIN products p ON c.id = p.category_id JOIN order_items oi ON p.id = oi.product_id JOIN orders o ON o.id = oi.order_id WHERE o.created_at >= '2026-04-01' AND o.created_at < '2026-05-01' GROUP BY c.id, c.name ORDER BY total_sales DESC;
1 2 3 4 5 6 7 8 9 10 11 12 13
**Step 5 - Cleaned Result**: ```sql SELECT c.name AS category_name, SUM(oi.amount) AS total_sales FROM categories c JOIN products p ON c.id = p.category_id JOIN order_items oi ON p.id = oi.product_id JOIN orders o ON o.id = oi.order_id WHERE o.created_at >= '2026-04-01' AND o.created_at < '2026-05-01' GROUP BY c.id, c.name ORDER BY total_sales DESC;
Perfect! 🎉 Can be passed directly to execute_query().
🔟 Summary and Best Practices
✅ What We Achieved
Zero-cost model switching: One config line to switch from DeepSeek to GPT-4
The next article will dive into the Metadata Management System — how to let the LLM automatically understand database structures? How to intelligently infer table relationships? How to build high-quality semantic models?