0. Series Overview

Position in Series Upstream Output Downstream
Article 1/10 Business scenario definition Technical selection conclusion, project boundary Article 02: Data format → Article 05: Training script

This series follows a single pipeline: JSONL data → train_lora_single.pyfinal_loraverify_lora.py → vLLM API. It does not cover multi-GPU DDP or merging LoRA into the base model (these will be covered separately later).


1. The Real Problems to Solve

1.1 Base Model’s Default Behavior Doesn’t Fit the Scenario

Qwen3.5-4B, as a general-purpose conversational model, tends to output solution-oriented responses when faced with typical elderly users’ confiding: suggesting activities, recommending doctor visits, advising “stay positive.” This is logically correct, but in an emotional companionship scenario, it feels like lecturing.

The first user message in the training data is:

1
My children are all busy, no one to talk to all day, the house is eerily quiet.

The expected assistant style (see LoRA_Demo/data/elderly_chat.jsonl line 1) is:

1
I truly understand that feeling of emptiness in the silence. You're not alone; I'm always here with you. Let's chat slowly, say whatever you'd like.

The key word is not “what to do,” but empathy + companionship + no urging. This is a style transfer task, not a knowledge injection task.

1.2 Five Psychological Models Define the Data Boundary

The 1,000 training samples are divided into five categories (200 each), covering:

Theme Typical User Emotion Model Should Avoid
Loneliness & Yearning for Company Feeling empty, no one to talk to Immediately listing “go to community events”
Health Anxiety & Fear of Death Check-ups, insomnia, fear of being a burden Diagnostic statements, guaranteeing cures
Fear of Being a Nuisance & Self-Blame Afraid to ask for leave, afraid to spend money Logically invalidating feelings
Nostalgia & Longing for Validation The good old days, feeling misunderstood Dismissing as “old-fashioned”
Low Mood & Wanting to Feel Needed Feeling useless, feeling redundant Hollow encouragement like “you need to cheer up”

The role of LoRA SFT: without altering factual capabilities (medicine, law, etc.), shift the response distribution toward “gentle companionship.”


2. Implementation Location & Repository Structure

1
2
3
4
5
6
7
8
LoRA_Demo/
├── train_lora_single.py # Single-GPU SFT entry (core of this series)
├── verify_lora.py # loss + inference verification
├── data/elderly_chat.jsonl # 1,000 messages
├── models/Qwen3.5-4B/ # Base model (~8.7 GB)
├── output/lora_elderly_single/
│ └── final_lora/ # Output adapter (~41 MB)
└── all_logs.log # Full training log on V100

Scripts not included in this series (exist but not covered yet): train_lora_multi.py (multi-GPU), merge_lora.py (merge weights).


3. Why LoRA + SFT Instead of Other Approaches

3.1 Compared with Full Fine-Tuning

Dimension Full Fine-Tuning 4B LoRA (This Project)
Trainable Parameters ~4.2 billion 10,616,832 (0.2518%) — see all_logs.log line 34
Single Output Size ~8 GB+ adapter_model.safetensors ~41 MB
Actual Training Time Not tested 2484 s (41 min 23 sec)
Data Scale Suitability Usually needs larger corpus 1,000 style samples are effective

From the log:

1
trainable params: 10,616,832 || all params: 4,216,368,128 || trainable%: 0.2518

3.2 Compared with Prompt-Only

Using only a system prompt (the SYSTEM_PROMPT in verify_lora.py is consistent with training) can slightly improve tone, but the base model still tends to slip into “advisory” structures. SFT pulls the token distribution of entire responses toward the empathy templates in the data, which is more stable than pure prompting.

3.3 Compared with Configuration-Based Frameworks like LLaMA-Factory

This project deliberately retains a 250-line readable script (train_lora_single.py), with comments explaining LoRA principles and single-GPU device_map. The goal is to understand the TRL + PEFT call chain by reading the code, not to hide behind a layer of YAML.

Technology stack:

  • TRL SFTTrainer + SFTConfig (v1.5.1, see checkpoint README)
  • PEFT LoraConfig (v0.19.1, see final_lora/adapter_config.json)
  • Transformers 5.9.0 for loading Qwen3.5-4B

4. End-to-End Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
flowchart TB
subgraph data [Data Layer]
J[elderly_chat.jsonl]
end
subgraph train [Training Layer]
T[train_lora_single.py]
L[final_lora adapter]
end
subgraph eval [Evaluation Layer]
V[verify_lora.py]
end
subgraph serve [Serving Layer]
VL[vLLM serve + --lora-modules]
API["/v1/chat/completions"]
end
J --> T --> L
L --> V
L --> VL --> API

5. Key Numbers from a Real Training Run (Reference)

Source: LoRA_Demo/all_logs.log

Metric Value Note
GPU Tesla V100S-PCIE-32GB 31.7 GB VRAM
Samples 1000 JSONL loaded (log line 31)
Effective batch size 4 2 × 2 gradient accumulation
Total steps 750 3 epochs
Step 1 loss 2.8099 Token accuracy 43.6%
Step 250 loss 0.2402 End of first epoch
Step 750 loss 0.1286 Training complete
Average train_loss 0.2587 Summary line
Throughput 1.208 samples/s

These numbers show: small data + low-rank adapter can achieve observable style convergence within 40 minutes, suitable for individuals/small teams iterating.


6. Pitfalls (Know Before Starting the Project)

Pitfall 1: Turning companionship into “health Q&A”
If the assistant’s replies are full of medical advice, loss will still drop, but the product positioning is off. The data deliberately avoids diagnosis and guarantees of cure.

Pitfall 2: Assuming a 4B small model “doesn’t need fine-tuning”
In practice, running verify_lora.py on the LoRA model on a Mac mini (MPS + float16), questions like “can’t sleep at night” consistently produce “I’m here with you” type replies; the base model with the same prompt still tends to give suggestion lists. The difference from fine-tuning is in tone structure, not IQ.

Pitfall 3: Inconsistent system prompt between training and inference
The SYSTEM_PROMPT in verify_lora.py lines 33–36 must match the system message in JSONL exactly, otherwise verification conclusions are unreliable.


7. Summary

  1. The scenario is elderly emotional companionship, the core is an empathic style, not general chat.
  2. LoRA SFT completes style transfer with 1,000 data points, 41 MB weights, and 41 minutes of training at a reproducible cost.
  3. The code entry is train_lora_single.py, verification entry is verify_lora.py, deployment via vLLM dynamic LoRA.
  4. This series of 10 articles is organized in the order: data → theory → environment → code → metrics → verification → pitfalls → deployment.

Appendix: Training Script File Header (Design Intent)

Source: LoRA_Demo/train_lora_single.py lines 1–33

1
2
3
4
5
6
7
8
9
10
11
12
13
"""
【What is LoRA?】
- Freezes the base model's 4 billion parameters
- Injects small trainable matrices into selective linear layers (~0.25% parameters)

【5-Step Workflow】
1. Load tokenizer + base model
2. Configure LoRA (specify which layers to inject)
3. Load JSONL conversation data
4. Configure SFT training parameters
5. Train and save LoRA weights
"""
# The file header hardcodes the "reading order" so readers don't jump to Trainer without understanding device_map

Series Navigation

Article Link
Next 02 · Training Dataset Design
Index README

← Back to LoRA Elderly Companionship Topic