1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
| (ai) yiyu@ubuntu:~/nfs/workspace/qwen3_08b_tunning$ python finetune_qwen.py 🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. 🦥 Unsloth Zoo will now patch everything to make training faster! ============================================================ Qwen3-8B 微调 ============================================================
GPU信息: CUDA可用: True GPU名称: Tesla V100-SXM2-32GB GPU内存: 31.7 GB
1. 加载模型和分词器... ==((====))== Unsloth 2025.12.8: Fast Qwen3 patching. Transformers: 4.57.3. \\ /| Tesla V100-SXM2-32GB. Num GPUs = 1. Max memory: 31.739 GB. Platform: Linux. O^O/ \_/ \ Torch: 2.9.1+cu128. CUDA: 7.0. CUDA Toolkit: 12.8. Triton: 3.5.1 \ / Bfloat16 = FALSE. FA [Xformers = 0.0.33.post2. FA2 = False] "-____-" Free license: http://github.com/unslothai/unsloth Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored! Loading checkpoint shards: 100%|█████████████████████████████████████████| 5/5 [02:30<00:00, 30.01s/it] ✓ 模型加载成功 ✓ Tokenizer设置完成
2. 应用LoRA配置... Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05. Unsloth will patch all other layers, except LoRA matrices, causing a performance hit. Unsloth 2025.12.8 patched 36 layers with 0 QKV layers, 0 O layers and 0 MLP layers. ✓ LoRA应用成功 (r=16, alpha=32) ✓ 可训练参数: 43,646,976 (占总参数 0.92%)
3. 加载和准备数据... Map: 100%|█████████████████████████████████████████████████| 719/719 [00:00<00:00, 11545.18 examples/s] ✓ 训练数据: 719 条样本 ⓘ 评估数据不存在,跳过评估
4. 设置训练参数... ✓ 训练参数设置完成
5. 创建训练器... Unsloth: Tokenizing ["text"] (num_proc=20): 100%|████████████| 719/719 [00:04<00:00, 169.16 examples/s] ✓ 训练器创建成功
============================================================ 开始训练... ============================================================ The model is already on multiple devices. Skipping the move to device specified in `args`. ==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 \\ /| Num examples = 719 | Num Epochs = 3 | Total steps = 270 O^O/ \_/ \ Batch size per device = 2 | Gradient accumulation steps = 4 \ / Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8 "-____-" Trainable parameters = 43,646,976 of 8,234,382,336 (0.53% trained) 0%| | 0/270 [00:00<?, ?it/s]Unsloth: Will smartly offload gradients to save VRAM! {'loss': 2.2591, 'grad_norm': 0.9805265665054321, 'learning_rate': 0.0002, 'epoch': 0.11} {'loss': 1.6408, 'grad_norm': 0.6932195425033569, 'learning_rate': 0.0001992764570419069, 'epoch': 0.22} {'loss': 1.5088, 'grad_norm': 0.5240709781646729, 'learning_rate': 0.00019711629845587164, 'epoch': 0.33} {'loss': 1.411, 'grad_norm': 0.6144455671310425, 'learning_rate': 0.0001935507835925601, 'epoch': 0.44} {'loss': 1.4867, 'grad_norm': 0.6005836129188538, 'learning_rate': 0.00018863150851539877, 'epoch': 0.56} {'loss': 1.3918, 'grad_norm': 0.6643008589744568, 'learning_rate': 0.00018242965936120768, 'epoch': 0.67} {'loss': 1.4473, 'grad_norm': 0.8665452003479004, 'learning_rate': 0.00017503498221564025, 'epoch': 0.78} {'loss': 1.3993, 'grad_norm': 0.5923981666564941, 'learning_rate': 0.00016655448441021747, 'epoch': 0.89} {'loss': 1.3726, 'grad_norm': 0.6304987072944641, 'learning_rate': 0.00015711088603430405, 'epoch': 1.0} {'loss': 1.2395, 'grad_norm': 0.7031642198562622, 'learning_rate': 0.00014684084406997903, 'epoch': 1.11} {'loss': 1.1672, 'grad_norm': 0.8705151081085205, 'learning_rate': 0.0001358929748480946, 'epoch': 1.22} {'loss': 1.0753, 'grad_norm': 1.0293835401535034, 'learning_rate': 0.00012442570344228313, 'epoch': 1.33} {'loss': 1.1042, 'grad_norm': 1.4802284240722656, 'learning_rate': 0.00011260497112202895, 'epoch': 1.44} {'loss': 1.0878, 'grad_norm': 1.1789474487304688, 'learning_rate': 0.00010060183403992856, 'epoch': 1.56} {'loss': 1.1023, 'grad_norm': 1.3673157691955566, 'learning_rate': 8.858998790219753e-05, 'epoch': 1.67} {'loss': 1.055, 'grad_norm': 1.0307848453521729, 'learning_rate': 7.674325444256899e-05, 'epoch': 1.78} {'loss': 1.0381, 'grad_norm': 0.9197383522987366, 'learning_rate': 6.523306607246527e-05, 'epoch': 1.89} {'loss': 1.037, 'grad_norm': 0.9783514142036438, 'learning_rate': 5.422598510671666e-05, 'epoch': 2.0} {'loss': 0.7927, 'grad_norm': 1.3467820882797241, 'learning_rate': 4.388129346376178e-05, 'epoch': 2.11} {'loss': 0.7877, 'grad_norm': 1.3993027210235596, 'learning_rate': 3.4348687719438665e-05, 'epoch': 2.22} {'loss': 0.769, 'grad_norm': 1.4056848287582397, 'learning_rate': 2.576611286891901e-05, 'epoch': 2.33} {'loss': 0.7288, 'grad_norm': 1.5049540996551514, 'learning_rate': 1.825776614411082e-05, 'epoch': 2.44} {'loss': 0.6902, 'grad_norm': 1.1971173286437988, 'learning_rate': 1.1932299773007228e-05, 'epoch': 2.56} {'loss': 0.7905, 'grad_norm': 1.5358942747116089, 'learning_rate': 6.881248688597553e-06, 'epoch': 2.67} {'loss': 0.7258, 'grad_norm': 1.0989397764205933, 'learning_rate': 3.1777059397436692e-06, 'epoch': 2.78} {'loss': 0.7475, 'grad_norm': 0.9426413178443909, 'learning_rate': 8.752649719641848e-07, 'epoch': 2.89} {'loss': 0.7717, 'grad_norm': 1.2142314910888672, 'learning_rate': 7.244084232338466e-09, 'epoch': 3.0} {'train_runtime': 1060.9473, 'train_samples_per_second': 2.033, 'train_steps_per_second': 0.254, 'train_loss': 1.1343570991798684, 'epoch': 3.0} 100%|████████████████████████████████████████████████████████████████████████████| 270/270 [17:40<00:00, 3.93s/it]
============================================================ 训练完成! ============================================================
训练统计: 总训练步数: 270 训练耗时: 1060.95 秒 每秒步数: 2.03
6. 保存模型... ✓ 模型保存到: /home/yiyu/nfs/workspace/qwen3_08b_tunning/qwen3-8b-finetuned ✓ 训练参数保存
============================================================ 微调流程完成! ============================================================
|