Add support for QAT full fine-tuning #3238

andrewor14 · 2025-08-29T21:51:23Z

Summary: Following #2976, which adds support for QAT + LoRA, this PR adds support for QAT during full fine-tuning. See the torchao QAT README for more details.

Current QAT schemes supported are:

fp8-int4, targeting the torch.ops.fbgemm.f8i4bf16_shuffled kernel
fp8-fp8, targeting the torch.ops.fbgemm.f8f8bf16_rowwise kernel

Test Plan: https://gist.github.com/andrewor14/048b5c1bd01b7fa23c53913856a8ef9f

Full fine-tuning Llama3.1-8B with and without QAT on yahma/alpaca-cleaned for 1 epoch:

Batch size = 16 (no grad accum)
Learning rate = 4e-5
Quantization scheme = fp8-int4

Wikitext perplexity:

QAT improved perplexity by 19.2% compared to regular fine-tuning
QAT's int4 quantized model even outperformed the bf16 baseline
Regular int4 quantized model (without QAT) was significantly worse than the bf16 baseline

==> unsloth_model_full_baseline_output/eval_float.log <==
|        |       |none  |     0|word_perplexity|↓  |9.8446|±  |   N/A|

==> unsloth_model_full_baseline_output/eval_quantized.log <==
|        |       |none  |     0|word_perplexity|↓  |11.4595|±  |   N/A|

==> unsloth_model_full_qat_fp8-int4_output/eval_quantized.log <==
|        |       |none  |     0|word_perplexity|↓  |9.2336|±  |   N/A|

Fibonacci test:

Both bf16 baseline and int4 quantized models correctly identified 13 as the next number
QAT quantized model was more succinct in its response
No substantial differences here

### Instruction:
Continue the fibonnaci sequence.

### Input:
1, 1, 2, 3, 5, 8

==> unsloth_model_full_baseline_output/eval_float.log <==
### Response:
The next number in the Fibonacci sequence is 13.<|end_of_text|>

==> unsloth_model_full_baseline_output/eval_quantized.log <==
### Response:
The next number in the Fibonacci sequence is 13.<|end_of_text|>

==> unsloth_model_full_qat_fp8-int4_output/eval_quantized.log <==
### Response:
13<|end_of_text|>

jerryzh168 · 2025-08-30T02:38:09Z

@andrewor14 it's reversed I think, fp8-fp8 is targeting torch.ops.fbgemm.f8f8bf16_rowwise

andrewor14 · 2025-09-03T22:27:45Z

@andrewor14 it's reversed I think, fp8-fp8 is targeting torch.ops.fbgemm.f8f8bf16_rowwise

thanks, fixed

unsloth/models/_utils.py

unsloth/models/loader.py

danielhanchen

Nice work! Some small changes

andrewor14 · 2025-09-08T13:04:30Z

Thanks, just fixed

**Summary:** Following unslothai#2976, which adds support for QAT + LoRA, this PR adds support for QAT during full fine-tuning. See the [torchao QAT README](https://github.com/pytorch/ao/blob/main/torchao/quantization/qat/README.md) for more details. Current QAT schemes supported are: ``` fp8-int4, targeting the torch.ops.fbgemm.f8i4bf16_shuffled kernel fp8-fp8, targeting the torch.ops.fbgemm.f8f8bf16_rowwise kernel ``` **Test Plan:** https://gist.github.com/andrewor14/048b5c1bd01b7fa23c53913856a8ef9f Full fine-tuning Llama3.1-8B with and without QAT on `yahma/alpaca-cleaned` for 1 epoch: - Batch size = 16 (no grad accum) - Learning rate = 4e-5 - Quantization scheme = fp8-int4 Wikitext perplexity: - QAT improved perplexity by 19.2% compared to regular fine-tuning - QAT's int4 quantized model even outperformed the bf16 baseline - Regular int4 quantized model (without QAT) was significantly worse than the bf16 baseline ``` ==> unsloth_model_full_baseline_output/eval_float.log <== | | |none | 0|word_perplexity|↓ |9.8446|± | N/A| ==> unsloth_model_full_baseline_output/eval_quantized.log <== | | |none | 0|word_perplexity|↓ |11.4595|± | N/A| ==> unsloth_model_full_qat_fp8-int4_output/eval_quantized.log <== | | |none | 0|word_perplexity|↓ |9.2336|± | N/A| ``` Fibonacci test: - Both bf16 baseline and int4 quantized models correctly identified 13 as the next number - QAT quantized model was more succinct in its response - No substantial differences here ``` ### Instruction: Continue the fibonnaci sequence. ### Input: 1, 1, 2, 3, 5, 8 ==> unsloth_model_full_baseline_output/eval_float.log <== ### Response: The next number in the Fibonacci sequence is 13.<|end_of_text|> ==> unsloth_model_full_baseline_output/eval_quantized.log <== ### Response: The next number in the Fibonacci sequence is 13.<|end_of_text|> ==> unsloth_model_full_qat_fp8-int4_output/eval_quantized.log <== ### Response: 13<|end_of_text|> ```

andrewor14 force-pushed the qat_full_finetuning branch from 8063d27 to 11326df Compare September 5, 2025 20:56

danielhanchen reviewed Sep 8, 2025

View reviewed changes

unsloth/models/_utils.py Show resolved Hide resolved

danielhanchen reviewed Sep 8, 2025

View reviewed changes

unsloth/models/loader.py Outdated Show resolved Hide resolved

danielhanchen requested changes Sep 8, 2025

View reviewed changes

andrewor14 force-pushed the qat_full_finetuning branch from 11326df to b7ee3e9 Compare September 8, 2025 13:03

andrewor14 requested a review from danielhanchen September 8, 2025 13:04

andrewor14 force-pushed the qat_full_finetuning branch from b7ee3e9 to 52a507f Compare September 8, 2025 16:06

danielhanchen merged commit 3541f6e into unslothai:main Sep 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add support for QAT full fine-tuning #3238

Add support for QAT full fine-tuning #3238

andrewor14 commented Aug 29, 2025 •

edited

Loading

Uh oh!

jerryzh168 commented Aug 30, 2025

Uh oh!

andrewor14 commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

danielhanchen left a comment

Uh oh!

andrewor14 commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add support for QAT full fine-tuning #3238

Add support for QAT full fine-tuning #3238

Conversation

andrewor14 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryzh168 commented Aug 30, 2025

Uh oh!

andrewor14 commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrewor14 commented Aug 29, 2025 •

edited

Loading