Update prepare 4d causal attention call #2678

mmathew23 · 2025-06-03T18:47:05Z

The latest transformers removes the device argument to _prepare_4d_causal_attention_mask_with_cache_position, and since we provide device arg via _fast_prepare_inputs_for_generation, inference fails for models that have this function defined. Qwen2 is one such examples.

I have a backward compatible fix that inpsect the function signature for device, and calls the function appropriately.

test before fix
https://colab.research.google.com/drive/1EMVkur2wSTgvvbZawGuvtoxLPPPlj_QQ?usp=sharing

after fix
https://colab.research.google.com/drive/1m15_peSfF7TxcMUC7Y2nsp_kGk1QyDIH?usp=sharing

you'll see before that inference fails, but after it works as it should.

Update prepare 4d causal attention call

ffed29f

danielhanchen merged commit 6ebef50 into unslothai:main Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update prepare 4d causal attention call #2678

Update prepare 4d causal attention call #2678

Uh oh!

mmathew23 commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Update prepare 4d causal attention call #2678

Update prepare 4d causal attention call #2678

Uh oh!

Conversation

mmathew23 commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants