unsloth checkpointing fix for latest transformers==4.52.x #2674

mmathew23 · 2025-06-03T06:00:07Z

transformers==4.52.x introduced a GradientCheckpointingLayer and refactored the checkpointing logic. This adjusts FastLanguageModel to bypass our custom checkpoint logic when decoder layers are an instance of GradientCheckpointingLayer.

I have 3 notebooks to compare.

transformers==4.51.3
https://colab.research.google.com/drive/19tEe55Z-b3oz61S6R5diBoHAxHZVnOlp?usp=sharing
transformers==4.52.4
https://colab.research.google.com/drive/1IQZGdYYoF73NqG3WtO_ar7rfunllWGSr?usp=sharing
transformers==4.52.4 + checkpointing fix
https://colab.research.google.com/drive/1nfA9Sc20lBAbTzmo2S2VgAW-P6ZkOkMq?usp=sharing

as you can see in the second notebook the loss remains the same but timing is longer. In the final notebook loss and speed matches original transformers.

unsloth checkpointing fix for latest transformers==4.52.x

8119849

danielhanchen merged commit 1a1b51c into unslothai:main Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

unsloth checkpointing fix for latest transformers==4.52.x #2674

unsloth checkpointing fix for latest transformers==4.52.x #2674

mmathew23 commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

unsloth checkpointing fix for latest transformers==4.52.x #2674

unsloth checkpointing fix for latest transformers==4.52.x #2674

Conversation

mmathew23 commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants