0. Series Loop
| Position | Upstream | Output | Downstream |
|---|---|---|---|
| Post 8/10 | Post 07: Loss Convergence | Qualitative conclusion: Whether it’s “gentler, less preachy” | Post 09: Qwen-specific Issues · Post 10: vLLM Deployment |
Low loss ≠ product readiness. This post is the final human-readable checkpoint before deployment.
1. The Actual Problem to Solve
Post 07: token accuracy 96%, loss 0.13.
Still cannot answer:
- Will the base model still give a “suggestion list” under the same system prompt?
- Is LoRA just parroting the training set’s user turns?
- Can we run a smoke test on Mac without renting a GPU?
verify_lora.py design goals (file header lines 10–12):
- Read
trainer_state.jsonto check loss - For the same user, compare generation before/after fine-tuning (optional)
- Manual evaluation of whether it matches “elderly companionship”
2. Implementation Location
| Symbol | Line | Purpose |
|---|---|---|
SYSTEM_PROMPT |
33–36 | Consistent with JSONL system prompt |
DEFAULT_QUESTIONS |
38–42 | Three-topic smoke test questions |
print_training_metrics |
60–94 | Read checkpoint loss |
get_device_and_dtype |
101–107 | CUDA / MPS / CPU |
load_base_model / load_lora_model |
110–134 | Load separately to avoid GPU memory spikes |
generate_reply |
152–176 | chat_template + generate |
extract_final_reply |
137–149 | Strip thinking blocks |
verify_questions |
188–218 | Main flow |
Path constants:
1 | |
3. Command Line Modes
1 | |
User-tested on Mac mini with custom multiple questions + LoRA inference (terminal shows Test 1/1 and MPS), confirming --lora-only or single-question mode works on Apple Silicon.
4. VRAM Strategy: Sequential Loading, Explicit Release
1 | |
free_model (lines 179–185): del model + gc.collect() + cuda.empty_cache() / mps.empty_cache().
Reason: A 4B base model + another 4B base model with LoRA loaded simultaneously would definitely blow up on a 16GB Mac. Even on a 32GB V100, sequential loading is more stable.
5. Key Differences Between generate_reply and Training
1 | |
| Item | Training (train_lora_single.py) |
Verification (this post) |
|---|---|---|
| add_generation_prompt | False | True |
| enable_thinking | — | False |
| Goal | Fit the entire text | Generate new tokens |
If enable_thinking=False is missed, the output may be a full English thinking process (covered in Post 09).
6. Mac MPS Real-World Measurements (Project Output)
Environment: Mac mini · LoRA_Demo · .venv · mps + torch.float16
Log snippet:
1 | |
Also displays a flash-attn warning (falling back to torch implementation), does not affect generation.
6.1 Example 1: Insomnia / Loneliness
| Content | |
|---|---|
| user | Can’t sleep at night |
| LoRA | Night loneliness is the hardest. I’m here with you, don’t worry about falling asleep. Let’s chat slowly, and you can rest when you get sleepy. |
Evaluation: Matches the high-frequency sentence patterns in the training set: empathy + companionship + no urging. ✅
6.2 Example 2: Health Anxiety (Phrasing not in training set)
| Content | |
|---|---|
| user | I’m afraid to look at my medical checkup report; if I see it, I won’t be able to sleep. |
| LoRA | I understand the fear of being unable to move or becoming a burden to others. But if we exercise well and take care of ourselves now… I’m here with you. |
Evaluation: The emotion category (health anxiety + fear of being a burden) is correct, but it does not precisely echo “medical checkup report” — template-style generalization. Acceptable for production, but documentation should avoid exaggerating “customized every sentence.” ⚠️
6.3 Example 3: Self-Blame / Fear of Burdening Others
| Content | |
|---|---|
| user | Sometimes I feel I shouldn’t make them take leave for me anymore. |
| LoRA | You’re too frugal, too considerate of your children… You should spend, you should eat well… That itself is the best reward for them. I’m here with you. |
Evaluation: Aligns with the “fear of causing trouble” category. ✅
Script closing message (line 218):
1 | |
7. Before vs After Fine-Tuning (Expected Differences)
Full comparison requires not using --lora-only and sufficient memory on Mac to load the base model once.
| Dimension | Base Qwen3.5-4B | + LoRA |
|---|---|---|
| Opening | Tends to “I understand your feelings, suggest…” | Tends to “I understand / I really get it…” |
| Structure | List of suggestions | Short empathetic sentences + companionship |
| Taboos | May include “you should” | Data biased towards “I’m here with you” |
Default question 1 (DEFAULT_QUESTIONS[0]):
1 | |
Running the full verify_lora.py with this question on a V100 or Mac, and taking a screenshot, can be used as an illustration for the post.
8. Validation Rubric (Executable)
| Check Point | Pass Criteria |
|---|---|
| Loss | Decrease from start to end (metrics mode) |
| Empathy | Address the emotion first, not direct solution |
| Companionship | Phrases like “I’m here with you”, “let’s talk slowly” |
| Not preachy | No pile of “you should”, “I suggest you” |
| Generalization | Do not reproduce entire user turns from training set |
| Safety | No diagnosis, no promise of efficacy |
9. Pitfalls
Pitfall 1: device_map="auto" with PeftModel on Mac
Script comment at line 102: will error. Must use .to(device) before attaching LoRA.
Pitfall 2: SYSTEM_PROMPT mismatch with JSONL
Validation passes, but when using a different system prompt in vLLM for deployment, users feel it has “gone back to being preachy.”
Pitfall 3: Treating --lora-only as full validation
Only proves that the adapter loaded successfully and the style resembles the training set. Cannot prove improvement over the base model.
Pitfall 4: Judging the model by a single generationsampling has randomness. Run the same question 2–3 times, or temporarily set do_sample=False for comparison.
10. Summary
- verify_lora.py = metrics + optional A/B generation.
- Sequential loading + free_model is the key to VRAM management.
- enable_thinking=False and the difference in training template must be understood.
- Mac MPS tested successfully, suitable for local smoke tests.
- Health-related user queries may trigger similar phrasing — honestly document this in product expectations.
Appendix: extract_final_reply
1 | |
Series Navigation
| Post | Link |
|---|---|
| Previous | 07 · Training Curve |
| Next | 09 · Qwen3.5 Pitfalls |
| Index | README |