0. Series Overview
| This Article | Upstream | Output | Downstream |
|---|---|---|---|
| Article 6/10 | Article 05 – Dataset Ready | final_lora/, 15 checkpoints |
Article 07 – Reading Logs · Article 08 – Verification |
This is the only code snippet that modifies LoRA weights. After running, you should see:
1 | |
(Matching the last line of all_logs.log, with slight wording differences in the path.)
1. The Actual Problem Solved
Steps 5–7 handle:
- Merging HuggingFace
TrainingArgumentswith TRL‑specific fields (dataset_text_field,max_length) into SFTConfig - Creating SFTTrainer, where LoRA is injected at that point
- Running train() for 750 steps, and save_pretrained to save only the adapter
Many people get stuck on TRL 1.x API changes: processing_class replaces tokenizer, max_length replaces max_seq_length.
2. Implementation Locations
| Code Block | Lines |
|---|---|
SFTConfig(...) |
204–221 |
SFTTrainer(...) |
226–233 |
trainer.train() |
240 |
save_pretrained |
244–246 |
TrainingProgressCallback |
82–99 |
Output directory structure:
1 | |
3. SFTConfig Parameter Breakdown
1 | |
| Parameter | Value | Explanation |
|---|---|---|
per_device_train_batch_size=2 |
micro-batch | 2 samples per forward pass |
gradient_accumulation_steps=2 |
Accumulation | 2×2=4 before optimizer.step |
learning_rate=2e-4 |
Common for LoRA | Linear decay to ~0 |
bf16=True |
Mixed precision | Matches the loading dtype |
logging_steps=1 |
Log every step | Works with the callback print |
save_steps=50 |
Checkpoints | 750/50=15 checkpoints |
optim="paged_adamw_8bit" |
8‑bit Adam | Saves memory for optimizer states |
dataset_text_field="text" |
Field name | Corresponds to load_jsonl_data |
max_length=512 |
Truncation | Truncates long samples at the end |
report_to="none" |
No uploading | No W&B |
No eval_strategy set: no validation set, so logs will not contain eval_loss (see Article 07).
4. SFTTrainer Construction and train()
1 | |
4.1 Injection Timing
SFTTrainer internally calls PEFT, attaching LoRA layers to the target_modules. At this point it prints:
1 | |
4.2 What Happens Inside the Training Loop (Conceptual)
1 | |
The base (W) has no gradients; only (A,B) are updated.
4.3 Dual Logging
In the same step you will see:
- tqdm progress bar (
3.31s/it) - Callback line:
[Progress 33.3%] Step 250/750 | Epoch 1.00 | loss=0.2402 - Transformers JSON log:
{'loss': '0.24', 'mean_token_accuracy': '0.957', ...}
When reviewing, rely on the Callback line + summary line.
5. Step 7: Saving
1 | |
This saves the PeftModel’s adapter, not the merged full weights. For inference:
1 | |
For vLLM, use --lora-modules elderly=./output/.../final_lora (Article 10).
6. Checkpoints and Resumption
checkpoint-750/trainer_state.json contains:
global_step: 750log_history: loss for each step- optimizer / scheduler states (for
--resume_from_checkpoint)
For everyday verification use final_lora; intermediate checkpoints are only needed if you need to resume interrupted training.
7. OOM Contingency (in Priority Order)
1 | |
After changing parameters, the number of steps (750) will change (if batch or epochs are altered), so do not compare directly with old logs.
8. Pitfalls
Pitfall 1: max_seq_length from TRL 0.x tutorials used in SFTConfig
TRL 1.x uses max_length. Using the wrong parameter name will silently use the default, so you might think you’re training with 512 but actually use 1024.
Pitfall 2: Calling get_peft_model twice
See Article 05 – leads to abnormal trainable%.
Pitfall 3: save_steps set too low
Saving every 10 steps fills up disk space and slows down I/O. 50 steps is reasonable for this project.
Pitfall 4: After training, only copying checkpoint-750, forgetting final_lora
After trainer.train() finishes, save_pretrained(final_lora) is the clean adapter; checkpoints carry optimizer states and are larger.
9. Summary
- SFTConfig manages hyperparameters,
max_length, anddataset_text_field. - SFTTrainer + peft_config is the only point where LoRA is injected.
- 750 steps / 41 min / train_loss 0.2587 measured on a V100.
- final_lora is the path for deployment and verification; checkpoints are for resuming and loss curves.
- No validation set – effectiveness is verified in Article 08 (inference).
Appendix: TrainingProgressCallback.on_log
1 | |
Series Navigation
| Article | Link |
|---|---|
| Previous | 05 · SFT in Practice (Part 1) |
| Next | 07 · Training Curves |
| Index | README |