This series is a set of practical RAG data pipeline engineering practices, organized as “Overview → Ingestion → Chunking → Vectorization → Retrieval → Fusion → Foundation”. The code and architecture can be directly used for production selection.

Relationship with the In-Site Theory Series

Track Description Entry
This Series (Engineering Practice) 8 articles, telling you how to do it: Docker, extraction, chunking, Milvus, RRF, etc. Top navigation RAG Pipeline Series · Directory below this page
RAG Full-Link Theory Series 11 articles, telling you why: cleaning, metadata, Embedding, multi-path recall, Self-RAG, evaluation and deployment RAG Series

At the end of each practical article, there is an “In-Site Theory Extension”, pointing to the theoretical article corresponding to that chapter’s technical point, making it easy to trace back the methodology from the engineering practice.

  1. First, read Chapter 1 to establish a global view of the Pipeline (one-click Docker deployment)
  2. Choose according to your role: Data engineering focuses on chapters 2–3; algorithm focuses on chapters 4–7; architecture focuses on chapter 8
  3. When encountering a conceptual blind spot, click the theory link at the end of the article to jump to the RAG Theory Series