Skip to content

Unsloth Fine-tuning

Production fine-tuning workflows using the Unsloth framework for efficient LLM training

Collection Statistics

Total Notebooks: 37

Setup

Initial Unsloth configuration and environment setup

# Notebook Description
1 Unsloth Environment Verification

Fast Inference

Quick model inference examples with Llama and Qwen models

# Notebook Description
1 Fast Inference Test: Llama-3.2-1B
2 Fast Inference Test: Qwen3-4B
3 Fast Inference Test: Qwen3-4B-Thinking-2507

Vision Training

Vision model fine-tuning with Ministral and Pixtral

# Notebook Description
1 Unsloth Vision Training Verification
2 Unsloth Vision Training Verification (Pixtral)

SFT Training

Supervised Fine-Tuning for text and vision models

# Notebook Description
1 SFT Training Test: Ministral (Text-Only)
2 SFT Training Test: Ministral (Vision)
3 SFT Training Test: Pixtral (Vision)
4 SFT Training Test: Qwen3-4B
5 SFT Training Test: Qwen3-4B-Thinking-2507

GRPO Training

Generative Reward Policy Optimization training

# Notebook Description
1 GRPO Training Test: Ministral (Text-Only)
2 GRPO Training Test: Ministral (Vision)
3 GRPO Training Test: Pixtral (Vision)
4 GRPO Training Test: Qwen3-4B
5 GRPO Training Test: Qwen3-4B-Thinking-2507

DPO Training

Direct Preference Optimization for alignment

# Notebook Description
1 DPO Training Test: Qwen3-4B
2 DPO Training Test: Qwen3-4B-Thinking-2507

Reward Training

Reward model training for RLHF

# Notebook Description
1 Reward Model Training Test: Qwen3-4B
2 Reward Model Training Test: Qwen3-4B-Thinking-2507

RLOO Training

Reinforcement Learning from Language Model Optimization

# Notebook Description
1 RLOO Training Test: Ministral (Text-Only)
2 RLOO Training Test: Ministral (Vision)
3 RLOO Training Test: Pixtral (Vision)
4 RLOO Training Test: Qwen3-4B
5 RLOO Training Test: Qwen3-4B-Thinking-2507

QLoRA Experiments

Quantized LoRA experiments including alpha scaling, rank comparison, and quantization

# Notebook Description
1 QLoRA Parameter Test: Alpha Scaling - Ministral-3B-Reasoning
2 QLoRA Parameter Test: Alpha Scaling - Qwen3-4B-Thinking
3 QLoRA Advanced: Continual Learning - Ministral-3B-Reasoning
4 QLoRA Advanced: Continual Learning - Qwen3-4B-Thinking
5 QLoRA Parameter Test: LoRA Rank Comparison - Ministral-3B-Reasoning
6 QLoRA Parameter Test: LoRA Rank Comparison - Qwen3-4B-Thinking
7 QLoRA Advanced: Multi-Adapter Training - Ministral-3B-Reasoning
8 QLoRA Advanced: Multi-Adapter Training - Qwen3-4B-Thinking
9 QLoRA Test: Quantization Method Comparison - Ministral-3B-Reasoning
10 QLoRA Test: Quantization Method Comparison - Qwen3-4B-Thinking
11 QLoRA Test: Target Modules Comparison - Ministral-3B-Reasoning
12 QLoRA Test: Target Modules Comparison - Qwen3-4B-Thinking