Why Use LoRA for Elderly Emotional Companionship AI

0. Series Overview

Position in Series	Upstream	Output	Downstream
Article 1/10	Business scenario definition	Technical selection conclusion, project boundary	Article 02: Data format → Article 05: Training script

This series follows a single pipeline: JSONL data → train_lora_single.py → final_lora → verify_lora.py → vLLM API. It does not cover multi-GPU DDP or merging LoRA into the base model (these will be covered separately later).

1. The Real Problems to Solve

1.1 Base Model’s Default Behavior Doesn’t Fit the Scenario

Qwen3.5-4B, as a general-purpose conversational model, tends to output solution-oriented responses when faced with typical elderly users’ confiding: suggesting activities, recommending doctor visits, advising “stay positive.” This is logically correct, but in an emotional companionship scenario, it feels like lecturing.

The first user message in the training data is:

1	`My children are all busy, no one to talk to all day, the house is eerily quiet.`

The expected assistant style (see LoRA_Demo/data/elderly_chat.jsonl line 1) is:

1	`I truly understand that feeling of emptiness in the silence. You're not alone; I'm always here with you. Let's chat slowly, say whatever you'd like.`

The key word is not “what to do,” but empathy + companionship + no urging. This is a style transfer task, not a knowledge injection task.

1.2 Five Psychological Models Define the Data Boundary

The 1,000 training samples are divided into five categories (200 each), covering:

Theme	Typical User Emotion	Model Should Avoid
Loneliness & Yearning for Company	Feeling empty, no one to talk to	Immediately listing “go to community events”
Health Anxiety & Fear of Death	Check-ups, insomnia, fear of being a burden	Diagnostic statements, guaranteeing cures
Fear of Being a Nuisance & Self-Blame	Afraid to ask for leave, afraid to spend money	Logically invalidating feelings
Nostalgia & Longing for Validation	The good old days, feeling misunderstood	Dismissing as “old-fashioned”
Low Mood & Wanting to Feel Needed	Feeling useless, feeling redundant	Hollow encouragement like “you need to cheer up”

The role of LoRA SFT: without altering factual capabilities (medicine, law, etc.), shift the response distribution toward “gentle companionship.”

2. Implementation Location & Repository Structure

LoRA_Demo/
├── train_lora_single.py          # Single-GPU SFT entry (core of this series)
├── verify_lora.py                # loss + inference verification
├── data/elderly_chat.jsonl       # 1,000 messages
├── models/Qwen3.5-4B/            # Base model (~8.7 GB)
├── output/lora_elderly_single/
│   └── final_lora/               # Output adapter (~41 MB)
└── all_logs.log                  # Full training log on V100

Scripts not included in this series (exist but not covered yet): train_lora_multi.py (multi-GPU), merge_lora.py (merge weights).

3. Why LoRA + SFT Instead of Other Approaches

3.1 Compared with Full Fine-Tuning

Dimension	Full Fine-Tuning 4B	LoRA (This Project)
Trainable Parameters	~4.2 billion	10,616,832 (0.2518%) — see `all_logs.log` line 34
Single Output Size	~8 GB+	`adapter_model.safetensors` ~41 MB
Actual Training Time	Not tested	2484 s (41 min 23 sec)
Data Scale Suitability	Usually needs larger corpus	1,000 style samples are effective

From the log:

1	`trainable params: 10,616,832 \|\| all params: 4,216,368,128 \|\| trainable%: 0.2518`

3.2 Compared with Prompt-Only

Using only a system prompt (the SYSTEM_PROMPT in verify_lora.py is consistent with training) can slightly improve tone, but the base model still tends to slip into “advisory” structures. SFT pulls the token distribution of entire responses toward the empathy templates in the data, which is more stable than pure prompting.

3.3 Compared with Configuration-Based Frameworks like LLaMA-Factory

This project deliberately retains a 250-line readable script (train_lora_single.py), with comments explaining LoRA principles and single-GPU device_map. The goal is to understand the TRL + PEFT call chain by reading the code, not to hide behind a layer of YAML.

Technology stack:

TRL SFTTrainer + SFTConfig (v1.5.1, see checkpoint README)
PEFT LoraConfig (v0.19.1, see final_lora/adapter_config.json)
Transformers 5.9.0 for loading Qwen3.5-4B

4. End-to-End Architecture

flowchart TB
    subgraph data [Data Layer]
        J[elderly_chat.jsonl]
    end
    subgraph train [Training Layer]
        T[train_lora_single.py]
        L[final_lora adapter]
    end
    subgraph eval [Evaluation Layer]
        V[verify_lora.py]
    end
    subgraph serve [Serving Layer]
        VL[vLLM serve + --lora-modules]
        API["/v1/chat/completions"]
    end
    J --> T --> L
    L --> V
    L --> VL --> API

5. Key Numbers from a Real Training Run (Reference)

Source: LoRA_Demo/all_logs.log

Metric	Value	Note
GPU	Tesla V100S-PCIE-32GB	31.7 GB VRAM
Samples	1000	JSONL loaded (log line 31)
Effective batch size	4	`2 × 2` gradient accumulation
Total steps	750	3 epochs
Step 1 loss	2.8099	Token accuracy 43.6%
Step 250 loss	0.2402	End of first epoch
Step 750 loss	0.1286	Training complete
Average train_loss	0.2587	Summary line
Throughput	1.208 samples/s

These numbers show: small data + low-rank adapter can achieve observable style convergence within 40 minutes, suitable for individuals/small teams iterating.

6. Pitfalls (Know Before Starting the Project)

Pitfall 1: Turning companionship into “health Q&A”
If the assistant’s replies are full of medical advice, loss will still drop, but the product positioning is off. The data deliberately avoids diagnosis and guarantees of cure.

Pitfall 2: Assuming a 4B small model “doesn’t need fine-tuning”
In practice, running verify_lora.py on the LoRA model on a Mac mini (MPS + float16), questions like “can’t sleep at night” consistently produce “I’m here with you” type replies; the base model with the same prompt still tends to give suggestion lists. The difference from fine-tuning is in tone structure, not IQ.

Pitfall 3: Inconsistent system prompt between training and inference
The SYSTEM_PROMPT in verify_lora.py lines 33–36 must match the system message in JSONL exactly, otherwise verification conclusions are unreliable.

7. Summary

The scenario is elderly emotional companionship, the core is an empathic style, not general chat.
LoRA SFT completes style transfer with 1,000 data points, 41 MB weights, and 41 minutes of training at a reproducible cost.
The code entry is train_lora_single.py, verification entry is verify_lora.py, deployment via vLLM dynamic LoRA.
This series of 10 articles is organized in the order: data → theory → environment → code → metrics → verification → pitfalls → deployment.

Appendix: Training Script File Header (Design Intent)

Source: LoRA_Demo/train_lora_single.py lines 1–33

"""
【What is LoRA?】
  - Freezes the base model's 4 billion parameters
  - Injects small trainable matrices into selective linear layers (~0.25% parameters)

【5-Step Workflow】
  1. Load tokenizer + base model
  2. Configure LoRA (specify which layers to inject)
  3. Load JSONL conversation data
  4. Configure SFT training parameters
  5. Train and save LoRA weights
"""
# The file header hardcodes the "reading order" so readers don't jump to Trainer without understanding device_map

Article	Link
Next	02 · Training Dataset Design
Index	README

← Back to LoRA Elderly Companionship Topic