Skip to content

Conversation

@mmathew23
Copy link
Collaborator

The latest transformers removes the device argument to _prepare_4d_causal_attention_mask_with_cache_position, and since we provide device arg via _fast_prepare_inputs_for_generation, inference fails for models that have this function defined. Qwen2 is one such examples.

I have a backward compatible fix that inpsect the function signature for device, and calls the function appropriately.

test before fix
https://colab.research.google.com/drive/1EMVkur2wSTgvvbZawGuvtoxLPPPlj_QQ?usp=sharing

after fix
https://colab.research.google.com/drive/1m15_peSfF7TxcMUC7Y2nsp_kGk1QyDIH?usp=sharing

you'll see before that inference fails, but after it works as it should.

@danielhanchen danielhanchen merged commit 6ebef50 into unslothai:main Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants