patch sft_trainer to favor max_seq_length over max_length in config #2669
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
newest trl has changed from max_seq_length to max_length. In order to maintain old behavior unsloth will favor max_seq_length if it's in the args or if it's set inside the model for sft trainer only.
tested with qwen3:
trl==0.15.2 takes 26 mins
https://colab.research.google.com/drive/1A9DJ6SYsgYWtDciL2wvw-A-1ja238gs1?usp=sharing
upgrade to trl==0.18.1 changes the run to take 13 mins.
https://colab.research.google.com/drive/1IYGn6psJyfhZz-Zabg0IrrhUOBnRMtpR?usp=sharing
Issue is that max_length is defaulted to 1024 and unsloth prepare sft dataset takes this as preference. I added a check to the trl patches for sft_trainer file, and add explicit handling of max_length and max_seq_length. Now the notebook takes the expected ~20ish minutes.
https://colab.research.google.com/drive/1A6uj-VZsRPvPBLNy0ySPyC1kyV07Xeud?usp=sharing
You can also inspect the compiled files to see that it's only sfttrainer that gets impacted.