[FIX] Vllm guided decoding params #3662

Datta0 · 2025-12-01T10:48:21Z

vllm deprecated and now removed GuidedDecodingParams
This PR aims to patch that to StructuredOutputParams

Please also check
unslothai/unsloth-zoo#367

for more information, see https://pre-commit.ci

gemini-code-assist · 2025-12-01T10:48:39Z

Summary of Changes

Hello @Datta0, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving compatibility with external libraries, primarily vLLM, by adapting to recent API changes. It also includes several minor code quality enhancements, such as better error reporting during patching processes, specific numerical stability adjustments for the Qwen3 MoE model, and small textual corrections across the codebase. The overall goal is to ensure robust integration and maintain code clarity.

Highlights

vLLM Compatibility Fix: Updated the vLLM integration to address the deprecation and removal of GuidedDecodingParams, now dynamically aliasing it to StructuredOutputParams to maintain compatibility with trl.
Code Refactoring and Error Reporting: Refactored vLLM related patching functions into a single patch_vllm_imports and enhanced error reporting in trl patching by adding print statements for failed imports or class detections.
Qwen3 MoE Model Adjustments: Applied specific dtype adjustments for torch_nn_functional_softmax and final_X tensor initialization, and optimized an in-place addition operation in the Qwen3MoeDecoderLayer_fast_forward.
Minor Typo Corrections: Corrected a recurring typo from 'Inferene' to 'Inference' in comments across several model files (falcon_h1, gemma, gemma2, llama, mistral, qwen2, qwen3).
Improved Statistics Formatting: Enhanced the display of model architecture names in the statistics string within vision.py for better readability, specifically for 'VL' and 'MoE' components.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly addresses a breaking change in vllm by patching the deprecated GuidedDecodingParams. The refactoring of vLLM-related fixes into a single function is a good structural improvement. The additional logging in rl.py will be helpful for debugging. I've provided a few suggestions to further improve the code, including correcting a link in a comment and improving exception handling for the new log messages.

chatgpt-codex-connector · 2025-12-01T10:51:17Z

💡 Codex Review

unsloth/unsloth/models/qwen3_moe.py

Lines 131 to 133 in d837d62

    
               _flag_for_generation = self._flag_for_generation, 
        
           ) 
        
           hidden_states += hidden_states

Restore residual add in cached self-attention

When use_cache with _flag_for_generation is set (the standard generation path), the decoder now updates hidden_states with hidden_states += hidden_states instead of adding back the saved residual. This doubles the self-attention output and completely drops the skip connection, so inference with cache will emit incorrect activations. Please add the residual back (as in the uncached branch) rather than adding the tensor to itself.

unsloth/unsloth/models/qwen3_moe.py

Lines 140 to 143 in d837d62

    
           hidden_states, router_logits = Qwen3MoeSparseMoeBlock_fast_forward( 
        
               self.mlp, hidden_states 
        
           ) 
        
           hidden_states += hidden_states

Fix residual after MoE block in cached path

In the cached generation branch, the MoE block output is accumulated with hidden_states += hidden_states, again ignoring the stored residual. This removes the intended skip connection and scales the MoE output by two for any cached generation call, leading to wrong logits. It should mirror the uncached path and add the residual instead.

unsloth/unsloth/import_fixes.py

Lines 167 to 171 in d837d62

    
           try: 
        
               from vllm.sampling_params import GuidedDecodingParams 
        
           except ImportError: 
        
               vllm.sampling_params.GuidedDecodingParams = ( 
        
                   vllm.sampling_params.StructuredOutputsParams

Alias GuidedDecodingParams to correct vLLM class

The fallback for removed GuidedDecodingParams assigns it to vllm.sampling_params.StructuredOutputsParams, but the vLLM rename in PR #22772 introduced StructuredOutputParams (singular). On vLLM versions where GuidedDecodingParams is gone, this attribute access will raise an AttributeError and importing Unsloth will fail. Point the alias to the actual renamed class so the patch works on new vLLM releases.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

for more information, see https://pre-commit.ci

danielhanchen · 2025-12-01T12:22:43Z

unsloth/import_fixes.py

-        except Exception as e:
-            if UNSLOTH_ENABLE_LOGGING:
-                print(f"Unsloth: Failed patching vLLM with error = {str(e)}")
+                            text = text.replace(


Indentation issue

for more information, see https://pre-commit.ci

unsloth/import_fixes.py

for more information, see https://pre-commit.ci

This reverts commit fbb98c5.

for more information, see https://pre-commit.ci

This reverts commit c64d5b4.

for more information, see https://pre-commit.ci

This reverts commit c156545.

for more information, see https://pre-commit.ci

This reverts commit 6c47dc5.

vllm sampling params fix

85704eb

Datta0 changed the base branch from main to nightly December 1, 2025 10:48

[pre-commit.ci] auto fixes from pre-commit.com hooks

d837d62

for more information, see https://pre-commit.ci

gemini-code-assist bot reviewed Dec 1, 2025

View reviewed changes

Datta0 changed the title ~~[FIX] Vllm guided decoding params~~ [WIP][FIX] Vllm guided decoding params Dec 1, 2025

Datta0 and others added 2 commits December 1, 2025 11:48

do not patch base_trainer

5e4bc50

[pre-commit.ci] auto fixes from pre-commit.com hooks

fbeca18

for more information, see https://pre-commit.ci

Datta0 mentioned this pull request Dec 1, 2025

[FIX] vLLM LoRA extra vocab unslothai/unsloth-zoo#367

Merged

Datta0 changed the title ~~[WIP][FIX] Vllm guided decoding params~~ [FIX] Vllm guided decoding params Dec 1, 2025

danielhanchen reviewed Dec 1, 2025

View reviewed changes

Datta0 and others added 2 commits December 1, 2025 13:14

seperate vllm fixes

20e6882

[pre-commit.ci] auto fixes from pre-commit.com hooks

c060779

for more information, see https://pre-commit.ci

danielhanchen changed the base branch from nightly to main December 1, 2025 13:30

danielhanchen changed the base branch from main to nightly December 1, 2025 13:32

danielhanchen reviewed Dec 1, 2025

View reviewed changes

unsloth/import_fixes.py Show resolved Hide resolved

danielhanchen and others added 8 commits December 1, 2025 05:33

Apply suggestion from @danielhanchen

7b18829

[pre-commit.ci] auto fixes from pre-commit.com hooks

fbb98c5

for more information, see https://pre-commit.ci

Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

5c5d85b

This reverts commit fbb98c5.

[pre-commit.ci] auto fixes from pre-commit.com hooks

c64d5b4

for more information, see https://pre-commit.ci

Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

b31c748

This reverts commit c64d5b4.

[pre-commit.ci] auto fixes from pre-commit.com hooks

c156545

for more information, see https://pre-commit.ci

Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

eb5f5bc

This reverts commit c156545.

[pre-commit.ci] auto fixes from pre-commit.com hooks

b3f4f46

for more information, see https://pre-commit.ci

danielhanchen merged commit 6c47dc5 into unslothai:nightly Dec 1, 2025
1 check was pending

danielhanchen added a commit that referenced this pull request Dec 1, 2025

Revert "[FIX] Vllm guided decoding params (#3662)"

3d0be20

This reverts commit 6c47dc5.

Datta0 mentioned this pull request Dec 1, 2025

Vllm guided decoding #3663

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FIX] Vllm guided decoding params #3662

[FIX] Vllm guided decoding params #3662

Uh oh!

Datta0 commented Dec 1, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chatgpt-codex-connector bot commented Dec 1, 2025

Uh oh!

danielhanchen Dec 1, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[FIX] Vllm guided decoding params #3662

[FIX] Vllm guided decoding params #3662

Uh oh!

Conversation

Datta0 commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 1, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector bot commented Dec 1, 2025

💡 Codex Review

Uh oh!

danielhanchen Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Datta0 commented Dec 1, 2025 •

edited

Loading