0. Series Overview
| Position in Series | Upstream | Output | Downstream |
|---|---|---|---|
| Article 1/10 | Business scenario definition | Technical selection conclusion, project boundary | Article 02: Data format → Article 05: Training script |
This series follows a single pipeline: JSONL data → train_lora_single.py → final_lora → verify_lora.py → vLLM API. It does not cover multi-GPU DDP or merging LoRA into the base model (these will be covered separately later).
1. The Real Problems to Solve
1.1 Base Model’s Default Behavior Doesn’t Fit the Scenario
Qwen3.5-4B, as a general-purpose conversational model, tends to output solution-oriented responses when faced with typical elderly users’ confiding: suggesting activities, recommending doctor visits, advising “stay positive.” This is logically correct, but in an emotional companionship scenario, it feels like lecturing.
The first user message in the training data is:
1 | |
The expected assistant style (see LoRA_Demo/data/elderly_chat.jsonl line 1) is:
1 | |
The key word is not “what to do,” but empathy + companionship + no urging. This is a style transfer task, not a knowledge injection task.
1.2 Five Psychological Models Define the Data Boundary
The 1,000 training samples are divided into five categories (200 each), covering:
| Theme | Typical User Emotion | Model Should Avoid |
|---|---|---|
| Loneliness & Yearning for Company | Feeling empty, no one to talk to | Immediately listing “go to community events” |
| Health Anxiety & Fear of Death | Check-ups, insomnia, fear of being a burden | Diagnostic statements, guaranteeing cures |
| Fear of Being a Nuisance & Self-Blame | Afraid to ask for leave, afraid to spend money | Logically invalidating feelings |
| Nostalgia & Longing for Validation | The good old days, feeling misunderstood | Dismissing as “old-fashioned” |
| Low Mood & Wanting to Feel Needed | Feeling useless, feeling redundant | Hollow encouragement like “you need to cheer up” |
The role of LoRA SFT: without altering factual capabilities (medicine, law, etc.), shift the response distribution toward “gentle companionship.”
2. Implementation Location & Repository Structure
1 | |
Scripts not included in this series (exist but not covered yet): train_lora_multi.py (multi-GPU), merge_lora.py (merge weights).
3. Why LoRA + SFT Instead of Other Approaches
3.1 Compared with Full Fine-Tuning
| Dimension | Full Fine-Tuning 4B | LoRA (This Project) |
|---|---|---|
| Trainable Parameters | ~4.2 billion | 10,616,832 (0.2518%) — see all_logs.log line 34 |
| Single Output Size | ~8 GB+ | adapter_model.safetensors ~41 MB |
| Actual Training Time | Not tested | 2484 s (41 min 23 sec) |
| Data Scale Suitability | Usually needs larger corpus | 1,000 style samples are effective |
From the log:
1 | |
3.2 Compared with Prompt-Only
Using only a system prompt (the SYSTEM_PROMPT in verify_lora.py is consistent with training) can slightly improve tone, but the base model still tends to slip into “advisory” structures. SFT pulls the token distribution of entire responses toward the empathy templates in the data, which is more stable than pure prompting.
3.3 Compared with Configuration-Based Frameworks like LLaMA-Factory
This project deliberately retains a 250-line readable script (train_lora_single.py), with comments explaining LoRA principles and single-GPU device_map. The goal is to understand the TRL + PEFT call chain by reading the code, not to hide behind a layer of YAML.
Technology stack:
- TRL
SFTTrainer+SFTConfig(v1.5.1, see checkpoint README) - PEFT
LoraConfig(v0.19.1, seefinal_lora/adapter_config.json) - Transformers 5.9.0 for loading Qwen3.5-4B
4. End-to-End Architecture
1 | |
5. Key Numbers from a Real Training Run (Reference)
Source: LoRA_Demo/all_logs.log
| Metric | Value | Note |
|---|---|---|
| GPU | Tesla V100S-PCIE-32GB | 31.7 GB VRAM |
| Samples | 1000 | JSONL loaded (log line 31) |
| Effective batch size | 4 | 2 × 2 gradient accumulation |
| Total steps | 750 | 3 epochs |
| Step 1 loss | 2.8099 | Token accuracy 43.6% |
| Step 250 loss | 0.2402 | End of first epoch |
| Step 750 loss | 0.1286 | Training complete |
| Average train_loss | 0.2587 | Summary line |
| Throughput | 1.208 samples/s |
These numbers show: small data + low-rank adapter can achieve observable style convergence within 40 minutes, suitable for individuals/small teams iterating.
6. Pitfalls (Know Before Starting the Project)
Pitfall 1: Turning companionship into “health Q&A”
If the assistant’s replies are full of medical advice, loss will still drop, but the product positioning is off. The data deliberately avoids diagnosis and guarantees of cure.
Pitfall 2: Assuming a 4B small model “doesn’t need fine-tuning”
In practice, running verify_lora.py on the LoRA model on a Mac mini (MPS + float16), questions like “can’t sleep at night” consistently produce “I’m here with you” type replies; the base model with the same prompt still tends to give suggestion lists. The difference from fine-tuning is in tone structure, not IQ.
Pitfall 3: Inconsistent system prompt between training and inference
The SYSTEM_PROMPT in verify_lora.py lines 33–36 must match the system message in JSONL exactly, otherwise verification conclusions are unreliable.
7. Summary
- The scenario is elderly emotional companionship, the core is an empathic style, not general chat.
- LoRA SFT completes style transfer with 1,000 data points, 41 MB weights, and 41 minutes of training at a reproducible cost.
- The code entry is
train_lora_single.py, verification entry isverify_lora.py, deployment via vLLM dynamic LoRA. - This series of 10 articles is organized in the order: data → theory → environment → code → metrics → verification → pitfalls → deployment.
Appendix: Training Script File Header (Design Intent)
Source: LoRA_Demo/train_lora_single.py lines 1–33
1 | |
Series Navigation
| Article | Link |
|---|---|
| Next | 02 · Training Dataset Design |
| Index | README |