🚨 Don't use cache in non-generative models #38751

zucchini-nlp · 2025-06-11T12:02:39Z

What does this PR do?

When working on #38635, I found that there are some models which have past_key_values in their signature, even though they cannot generate. The reason is that models were all copying from Bert

This PR clean it up and changes the copy statement to Align model and adds support for new attention API in all those models

HuggingFaceDocBuilderDev · 2025-06-11T12:16:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2025-06-11T12:19:26Z

Yes, great cleanup! Ping me whenever it's ready and you want a review

zucchini-nlp · 2025-06-11T13:45:06Z

@Rocketknight1 ready for review! One thing to note: I didn't deprecate past_key_values from kwargs and simply deleted, since it wasn't used anyway. Do you think we need a deprecation cycle or raise an error when cache related kwargs are passed?

Rocketknight1 · 2025-06-12T14:35:03Z

@zucchini-nlp I think it's okay! I really hope people weren't passing past_key_values to non-generative models anyway 😬

Rocketknight1

Looks good! It's a really nice cleanup, made one comment, but also we should definitely run slow tests for some of these models before merging 😅

src/transformers/models/git/modeling_git.py

Cyrilvallez

Hey @zucchini-nlp! Could you provide a bit more details about why we remove the cross attention and positional embeddings completely everywhere please? 🤗 It is not obvious to me, because at first glancr they look like they were used at least sometimes no?

src/transformers/models/align/modeling_align.py

zucchini-nlp · 2025-06-13T12:19:26Z

run-slow: align,wav2vec2,layoutlm,clap

zucchini-nlp · 2025-06-13T12:20:45Z

run-slow: align,wav2vec2,layoutlm,clap

github-actions · 2025-06-13T12:20:50Z

This comment contains run-slow, running the specified jobs:

models: ['models/align', 'models/clap', 'models/layoutlm', 'models/wav2vec2']
quantizations: [] ...

github-actions · 2025-06-13T12:22:09Z

This comment contains run-slow, running the specified jobs:

models: ['models/align', 'models/clap', 'models/layoutlm', 'models/wav2vec2']
quantizations: [] ...

zucchini-nlp · 2025-06-18T08:19:44Z

hey @Cyrilvallez do you have any comments to address?

Cyrilvallez

Hey! I do like it, but the PR has the potential to be quite breaking in different ways:

position embedding type could be set in some configs on the hub
some models present in the main init see themselves change directly (cross attention/head mask), and even if they are part of bigger models, they are still public classes. E.g. for align, there are no public classes using the encoder_hidden_states, so it would be straightforward to remove them everywhere as you did, but in altclip, they flow correctly from AltCLIPTextModel which is public, and only the main model AltCLIPModel does not propagate them (so removing them is not directly breaking when using AltCLIPModel, but is if using the public submodel AltCLIPTextModel 🥲)
even method signature in fully internal classes can be breaking sometimes (though in this instance I wouldn't worry about it)

In general, I really like the changes because they clean-up a lot of non-sense in those modelings, but we need to be a bit wary of the potential implications here.
cc @ArthurZucker here for an opinion on whether we want to be agressive in favor of simplifications here, or if we want to do it through a deprecation cycle for public classes (but once again, even if public they are building block of the real bigger models)

src/transformers/models/align/modeling_align.py

Cyrilvallez · 2025-06-18T09:27:37Z

src/transformers/models/align/modeling_align.py

-        if self.is_decoder and encoder_hidden_states is not None:
-            if not hasattr(self, "crossattention"):
-                raise ValueError(
-                    f"If `encoder_hidden_states` are passed, {self} has to be instantiated with cross-attention layers"
-                    " by setting `config.add_cross_attention=True`"
-                )
-
-            # cross_attn cached key/values tuple is at positions 3,4 of past_key_value tuple
-            cross_attn_past_key_value = past_key_value[-2:] if past_key_value is not None else None
-            cross_attention_outputs = self.crossattention(
-                attention_output,
-                attention_mask,
-                head_mask,
-                encoder_hidden_states,
-                encoder_attention_mask,
-                cross_attn_past_key_value,
-                output_attentions,
-            )
-            attention_output = cross_attention_outputs[0]
-            outputs = outputs + cross_attention_outputs[1:-1]  # add cross attentions if we output attention weights
-
-            # add cross-attn cache to positions 3,4 of present_key_value tuple
-            cross_attn_present_key_value = cross_attention_outputs[-1]
-            present_key_value = present_key_value + cross_attn_present_key_value


Crazy that we had this block while the model does not propagate encoder_hidden_states 🥵

src/transformers/models/wav2vec2/modeling_wav2vec2.py

src/transformers/models/altclip/modeling_altclip.py

src/transformers/models/chinese_clip/modeling_chinese_clip.py

Cyrilvallez · 2025-06-18T10:48:48Z

src/transformers/models/bros/modeling_bros.py

-            past_key_values=encoder_outputs.past_key_values,
+            past_key_values=None,


Here we should remove it directly everywhere as well if we don't use it anyway

src/transformers/models/clap/modeling_clap.py

zucchini-nlp · 2025-06-18T11:43:21Z

Yeah, I have the same question. On one side the code path should have never been used and isn't propagated from Base/Task models. On the other side, we never know if users found a way to exploit it by loading specific layers and re-using them

I can add proper deprecation if we think this is too aggressive and remove everything in the next 2-3 releases. A bunch of unused code path is just 😖

ArthurZucker · 2025-06-20T08:21:13Z

Very nice! In the era of unbloating, let's remove as much as we can, and redirect users with on the hub code for the ones that still need this?
We can keep this for 1 release tho!

zucchini-nlp · 2025-06-20T13:24:37Z

@ArthurZucker @Cyrilvallez added deprecate_kwarg until the next v4.54 release in all forward calls. The failing test is not related

ArthurZucker

😮‍💨 finally getting rid of this old code!

src/transformers/models/align/modeling_align.py

src/transformers/models/altclip/modeling_altclip.py

* deprecate for 1 version * style * fix some tests * fix esm * skip for now, GC requires positional args but we have keyword args * remove transpose for scores in modified models only * skip fx trace tests

zucchini-nlp changed the title ~~don't use cache in non-generative models~~ Don't use cache in non-generative models Jun 11, 2025

zucchini-nlp requested a review from Rocketknight1 June 11, 2025 13:45

zucchini-nlp mentioned this pull request Jun 12, 2025

[cache] make all classes cache compatible finally #38635

Merged

Rocketknight1 approved these changes Jun 12, 2025

View reviewed changes

src/transformers/models/git/modeling_git.py Show resolved Hide resolved

zucchini-nlp requested a review from Cyrilvallez June 13, 2025 06:34

Cyrilvallez reviewed Jun 13, 2025

View reviewed changes

src/transformers/models/align/modeling_align.py Show resolved Hide resolved

src/transformers/models/align/modeling_align.py Show resolved Hide resolved

Cyrilvallez reviewed Jun 18, 2025

View reviewed changes

deprecate for 1 version

0bda5f1

zucchini-nlp force-pushed the clean-models-no-generation branch from 9ac4b2c to 0bda5f1 Compare June 20, 2025 10:00

style

d90c3af

zucchini-nlp changed the title ~~Don't use cache in non-generative models~~ 🚨 Don't use cache in non-generative models Jun 20, 2025

zucchini-nlp added 4 commits June 20, 2025 12:17

Merge branch 'main' into clean-models-no-generation

9c83ee0

fix some tests

6b5134b

fix esm

5321a7c

Merge branch 'main' into clean-models-no-generation

7ea7d72

zucchini-nlp added 3 commits June 23, 2025 16:17

merge main

82ca6ab

skip for now, GC requires positional args but we have keyword args

ee4c37b

Merge branch 'main' into clean-models-no-generation

153cc32

ArthurZucker approved these changes Jun 25, 2025

View reviewed changes

src/transformers/models/align/modeling_align.py Show resolved Hide resolved

src/transformers/models/altclip/modeling_altclip.py Outdated Show resolved Hide resolved

zucchini-nlp added 3 commits July 1, 2025 10:43

merge main

1a0c666

remove transpose for scores in modified models only

f24f7bf

skip fx trace tests

7c2e623

zucchini-nlp enabled auto-merge (squash) July 1, 2025 09:01

zucchini-nlp merged commit e435574 into huggingface:main Jul 1, 2025
20 checks passed

This was referenced Aug 15, 2025

[ESM] Add support for sdpa. #34954

Open

remove transpose_for_scores call in ESM-2 #40210

Merged

		past_key_values=encoder_outputs.past_key_values,
		past_key_values=None,

🚨 Don't use cache in non-generative models #38751

🚨 Don't use cache in non-generative models #38751

Uh oh!

Conversation

zucchini-nlp commented Jun 11, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 11, 2025

Uh oh!

Rocketknight1 commented Jun 11, 2025

Uh oh!

zucchini-nlp commented Jun 11, 2025

Uh oh!

Rocketknight1 commented Jun 12, 2025

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Jun 13, 2025

Uh oh!

zucchini-nlp commented Jun 13, 2025

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

zucchini-nlp commented Jun 18, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Cyrilvallez Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cyrilvallez Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp commented Jun 18, 2025

Uh oh!

ArthurZucker commented Jun 20, 2025

Uh oh!

zucchini-nlp commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zucchini-nlp commented Jun 20, 2025 •

edited

Loading