Qwen3.5 Pitfalls: Thinking Chain and flash-attn
0. Series Closure This Article Position Upstream This Article Output Downstream Article 9/10 Article 08 Inference Verification Qwen3.5 Specific Issue...
Read more →Agent · RAG · Engineering Deep Dives
From Known to Unknown — Practical guides for building production-grade AI systems.
Browse Topics ↓0. Series Closure This Article Position Upstream This Article Output Downstream Article 9/10 Article 08 Inference Verification Qwen3.5 Specific Issue...
Read more →0. Series Closure This Article Upstream Output Downstream Part 10/10 Training + Validation Production-ready HTTP API Business App / Mini Program...
Read more →0. Series Loop Position Upstream Output Downstream Post 8/10 Post 07: Loss Convergence Qualitative conclusion: Whether it’s “gentler, less preachy” Po...
Read more →0. Series Closure Position in Series Upstream This Article Output Downstream Article 7/10 Article 06: Training Complete Metric interpretation, determi...
Read more →0. Series Overview This Article Upstream Output Downstream Article 6/10 Article 05 – Dataset Ready final_lora/, 15 checkpoints Article 07 – Reading Lo...
Read more →0. Series Completion Position in Series Upstream This Output Downstream Article 5/10 Article 04: Environment Ready Tokenizer, Base Model, LoraConfig,...
Read more →0. Series Closing the Loop This Article’s Position Upstream This Article’s Deliverable Downstream Article 4/10 Article 03: Principles Runnable GPU env...
Read more →0. Series Closed Loop Position in Series Upstream This Post’s Output Downstream Post 3/10 Post 02: Data into Model Understand r/alpha/target...
Read more →0. Series Closure Position in Series Upstream This Article’s Output Downstream Article 2/10 Article 01: Scenario Definition Trainable JSONL Specificat...
Read more →0. Series Loop (Follow Along Without Public Source Code)End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_p...
Read more →0. Series Overview Position in Series Upstream Output Downstream Article 1/10 Business scenario definition Technical selection conclusion, project bou...
Read more →🎯 Core Questions of This ChapterHow to build a truly “intelligent” conversational system? Problems with ordinary chatbots: ❌ No memory: Each conversation roun...
Read more →🎯 Core Questions of This ChapterHow to achieve a smooth and intuitive drag-and-drop interaction experience? Challenge Pain Points of Traditional Solutions O...
Read more →🎯 Core Question of This ChapterWhat is the core contradiction of Dashboards? Dimension Problems with Traditional Approaches Our Solution Performance LLM...
Read more →🎯 Core Chapter QuestionsHow to safely and efficiently deploy a development environment application to production? Challenge Traditional Pain Points Our Solu...
Read more →0. Series Loop (Follow Along Even Without Public Source Code)End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analy...
Read more →0. Series Loop (Follow along without public source code)End-to-end pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_p...
Read more →0. Series Loop (Follow Without Publishing Source)End-to-end pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pipeline...
Read more →🎯 Core Questions of This ChapterHow to make LLM generate accurate, safe, and executable SQL? This is the core challenge of the entire system: ❌ The LLM doesn’...
Read more →0. Series Loop (Follow Along Without Public Source Code)End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_p...
Read more →0. Series Loop (Follow Along Without Public Source Code)End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_p...
Read more →0. Series Loop (Follow along without open-source code)End-to-end pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pip...
Read more →0. Series Loop (Read Along Without Public Source Code)End-to-End Pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pip...
Read more →0. Series Loop (Follow Along Without Public Source Code)End-to-End Pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_p...
Read more →🎯 Core Questions of This ChapterHow does an LLM understand your database structure? This is the core challenge of all NL→SQL systems: ❌ The LLM doesn’t know w...
Read more →0. Series Loop (Readalong Without Open Source)End-to-End Pipeline: Vue frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (P...
Read more →0. Series Loop (Follow Along Without Open Source Code)End-to-End Chain: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pipeli...
Read more →0. Series Loop (Read Along Without Public Source Code)End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pip...
Read more →🎯 Core Chapter QuestionsIn AI application development, how to elegantly call LLM APIs? Hard-coding with requests.post()? Or writing a set of calling logic for...
Read more →0. Series Loop (Read Along Without Public Source Code)End-to-End Pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pip...
Read more →0. Series Loop (Readable Without Public Source Code)End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi‑turn SSE → run_analysis_pipel...
Read more →0. Series Closed Loop (Read Along Even Without Public Source Code)End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_...
Read more →0. Series Loop (Follow Along Without Public Source Code)End-to-end pipeline: Vue frontend → api/routes/chat.py → Guide multi-turn SSE → run_analysis_pipeline (p...
Read more →📖 Introduction: Why This Project?In daily enterprise operations, data querying is one of the most frequent needs, but the reality is harsh: Business users: Wa...
Read more →0. Series Loop (Read Along Without Public Source Code)End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pip...
Read more →Pain Point: Using Generic Embedding Models for RAG, Getting Irrelevant Answers in Vertical Domains?Have you ever encountered this: Using OpenAI’s text-embeddin...
Read more →1. IntroductionWhen deploying with containers, it is common to encounter issues where the time inside the container differs from the host (typically by 8 hours)...
Read more →1. IntroductionIn JWT (JSON Web Token) based authentication systems, AccessToken is typically set with a short lifetime (15 minutes to 1 hour) to reduce securit...
Read more →1. IntroductionIn a microservices and front-end/back-end separation architecture, permission management is one of the fundamental pieces of system security...
Read more →1 IntroductionHybrid mobile development has become the mainstream choice for balancing multi-platform development efficiency with native experience. While WebVi...
Read more →1. IntroductionAI Agents are evolving from standalone applications to embedded interactive experiences on the web. The core challenge for frontend teams is: how...
Read more →1. IntroductionThis article introduces the issues of request timeout and slow responses in Agent systems caused by network fluctuations, long tool execution tim...
Read more →1. IntroductionTraditional RAG lacks proactive reasoning and multi-step retrieval capabilities in complex Q&A scenarios. This article introduces how to inte...
Read more →1. IntroductionAs large model-driven agents are gradually deployed in enterprise production environments, a common problem emerges: With the same underlying lar...
Read more →Pain point: Why is your table always empty when extracting PDF with pdfplumber?Let’s start with a real scenario: You have a 50-page PDF product technical specif...
Read more →1. IntroductionAs AI Agent application scenarios evolve from single-point tasks to complex workflows, multi-agent cross-platform collaboration has become a nece...
Read more →1. IntroductionAs AI Agent application scenarios evolve from single-point tasks to complex workflows, multi-agent cross-platform collaboration has become a nece...
Read more →Quick Adaptation for MCP Protocol Access in Agents1. IntroductionIn LLM application development, a common problem is the need to write adaptation code for each...
Read more →1. IntroductionFunction Call is a core capability that enables large language models to interact with external systems, allowing the model to trigger the execut...
Read more →Developing and Integrating Custom Tools for Agents1. IntroductionAgents extend their capabilities through tool calls (Function Call), which is currently the mai...
Read more →1. Introduction: Easy to Develop Agents, Hard to Deploy ThemThis article will focus on best practices for Agent development and deployment, systematically break...
Read more →Pain Point: Should chunk_size be 500 or 1000? Why is the effect still poor after adjusting N times?Chunking is the most underestimated step in a RAG system. Man...
Read more →Practical Guide to Building Long-Term Persistent Memory for Agents - Internal Knowledge Base Document1. IntroductionIn real-world agent deployments, stateless d...
Read more →1. IntroductionThe token limit of large model context windows (typically 4K–128K tokens) determines the amount of information that can be carried in a single co...
Read more →1. IntroductionAs LLM-driven agents are gradually deployed in enterprise production environments, a common issue has emerged: given the same underlying model, w...
Read more →📊 Table of Contents Why Fine-tune BGE-M3? Core Capabilities of BGE-M3 Fine-tuning Environment Setup Guide Data Preparation: Building a High-Quality Training Se...
Read more →IntroductionUnder the dual trends of edge computing and enterprise digitalization, the technical selection of Agents involves a trade-off between efficiency and...
Read more →1. IntroductionCurrent AI systems in production environments often face scenarios that are dynamic, complex, and require independent operation. This demands tha...
Read more →The Essential Difference Between Agent and Ordinary Large Model Conversational BotIntroductionWith the widespread application of Large Language Models (LLMs), c...
Read more →📊 Table of Contents Why Meticulously Design a Milvus Collection? Collection Schema Best Practices Deep Dive into HNSW Index Principles Complete HNSW Para...
Read more →Understanding AI Agent: A Simple ExplanationIntroductionLarge Language Models (LLMs) have demonstrated impressive capabilities — they can answer questions, writ...
Read more →1. IntroductionMoving a RAG system from prototype to production typically encounters three core challenges: uncontrollable response latency, fluctuating retriev...
Read more →📊 Table of Contents Why Tables are a Nightmare for RAG Systems? Core Idea of 4-Level Table Vectorization Detailed Design and Scenarios of 4 Levels Complete Cod...
Read more →📊 Table of Contents Why Do We Need Multi-Channel Retrieval Fusion? Deep Dive into RRF Algorithm Core Principles Multi-Channel Retrieval Architecture in RAG Sys...
Read more →📊 Table of Contents Why Does a RAG System Need a Triple‑Storage Architecture? Triple‑Storage Responsibility Division and Design Philosophy Dual‑Write Consisten...
Read more →1. Introduction: Why RAG Evaluation Can’t Rely on “Feelings” Alone?Imagine this scenario: You painstakingly build a RAG system. The knowledge base contains tens...
Read more →1. Introduction: The Limitations of Traditional RAG and the Value Proposition of Agentic RAGDo you still remember the excitement of deploying your first RAG sys...
Read more →1. Introduction: The “Blindness” Dilemma of Traditional RAG and the Multimodal BreakthroughHave you ever encountered this scenario: you feed a PDF report full o...
Read more →1. Introduction: The Bottleneck in Generation – Why Self-RAG and Adaptive Retrieval?RAG (Retrieval-Augmented Generation) has a common pain point: all queries ar...
Read more →1. Introduction: Single Path Not Enough – How Multi-Path Recall Solves the “Lopsided” Retrieval Problem?After deploying an RAG (Retrieval-Augmented Generation)...
Read more →Introduction: Why is your RAG retrieval always inaccurate? Starting with HyDEImagine this scenario: You’ve built an enterprise knowledge base RAG (Retrieval-Aug...
Read more →1. Introduction: Why Is the Embedding Model the “Invisible Bottleneck” of RAG?Have you ever encountered this scenario? Your RAG (Retrieval-Augmented Generation)...
Read more →1. Introduction: Why Does RAG Offline Preprocessing Need Metadata Enhancement and Knowledge Graphs?Hi, I’m a tech blogger. Today, let’s talk about a critical ye...
Read more →Introduction: When “Multi-Source Data” Becomes the Nightmare and Turning Point of RAGImagine this scenario: you’re developing a smart Q&A system for e-comme...
Read more →