Skip to content

Conversation

@mmathew23
Copy link
Collaborator

Llama vision inference has a cache issue. It seems we set a cache_implementation of hybrid where only swa attention models work with hybrid.

Notebooks:
Llama vision working with pypi transformers/trl: https://colab.research.google.com/drive/1A-mm-1pvDb6e5CGtpeRtVkGYUAtFMGv0?usp=sharing
llama vision working with main transformers/trl: https://colab.research.google.com/drive/1EqH1E7lB5PvaeNwhPkNNpzrlaxMWvrb2?usp=sharing
gemma notebook as sanity test: https://colab.research.google.com/drive/1b4vDYgKv_ZIurVSGIfBPSZQbpVJjseRX?usp=sharing

@danielhanchen danielhanchen merged commit ff6cbb0 into unslothai:main Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants