Releases · vllm-project/vllm-omni

Initial (Pre)-release of the vLLM-Omni Project

vLLM-Omni is a framework that extends its support for omni-modality model inference and serving. This pre-release is built on top of vllm==0.11.0, and same version number is used for the ease of tracking the dependency.

Please check out our documentation and we welcome any feedbacks & contributions!

What's Changed

init the folder directories for vLLM-omni by @hsliuustc0106 in #1
init main repo structure and demonstrate the AR + DiT demo for omni models by @hsliuustc0106 in #6
Add PR and issue templates from vLLM project by @hsliuustc0106 in #8
update RFC template by @hsliuustc0106 in #9
[Model]Add Qwen2.5-Omni model components by @tzhouam in #12
[Engine] Add entrypoint class and stage management by @Gaohan123 in #13
[Model] Add end2end example and documentation for qwen2.5-omni by @Gaohan123 in #14
[Worker]Feat/ar gpu worker and model runner by @tzhouam in #15
[Worker]Refactor GPU diffusion model runner and worker by @tzhouam in #16
[Worker]Add OmniGPUModelRunner and OmniModelInputForGPU classes by @tzhouam in #17
[Engine]Refactor output processing for multimodal capabilities in vLLM-omni by @tzhouam in #20
[Inputs, Engine]Add Omni model components and input processing for hidden states support by @tzhouam in #18
[Core]Add scheduling components for vLLM-omni by @tzhouam in #19
add precommit by @Gaohan123 in #32
End2end fixup by @tzhouam in #35
Remove unused files and fix some bugs by @Gaohan123 in #36
[bugfix] fix problem of installation by @Gaohan123 in #44
[Bugfix] Further supplement installation guide by @Gaohan123 in #46
[Bugfix] fix huggingface download problem for spk_dict.pt by @Gaohan123 in #47
[Refractor] Dependency refractored to vLLM v0.11.0 by @Gaohan123 in #48
[fix] Add support for loading model from a local path by @qibaoyuan in #52
[Feature] Multi Request Stream for Sync Mode by @tzhouam in #51
[Docs] Setup Documentation System and Re-organize Dependencies by @SamitHuang in #49
[fix] adapt hidden state device for multi-hardware support by @qibaoyuan in #61
[Feature] Support online inference by @Gaohan123 in #64
CI Workflows. by @congw729 in #50
[CI] fix ci and format existing code by @ZJY0516 in #71
[CI] disable unnecessary ci and update pre-commit by @ZJY0516 in #80
update readme for v0.11.0rc1 release by @hsliuustc0106 in #69
[CI] Add script for building wheel. by @congw729 in #75
[Feature] support multimodal inputs with multiple requests by @Gaohan123 in #76
[Feature] Add Gradio Demo for Qwen2.5Omni by @SamitHuang in #60
[CI] Buildkite setup by @ywang96 in #83
[CI]Add version number. by @congw729 in #87
[fix] Remove redundant parameter passing by @qibaoyuan in #90
[Docs] optimize and supplement docs system by @Gaohan123 in #86
[Diffusion] Qwen image support by @ZJY0516 in #82
[fix] add scheduler.py by @ZJY0516 in #94
Update gradio docs by @SamitHuang in #95
[Bugfix] Fix removal of old logs when stats are enabled by @syedmba in #84
[diffusion] add doc and fix qwen-image by @ZJY0516 in #96
Simple test from PR#88 on Buildkite by @ywang96 in #93
[Diffusion] Support Multi-image Generation and Add Web UI Demo for QwenImage by @SamitHuang in #97
[Doc] Misc documentation polishing by @ywang96 in #98
[Feature] add support for Qwen3-omni by @R2-Y in #55
[Bugfix] Fix special token nothink naming. by @ywang96 in #107
[Fix] fix qwen3-omni example by @ZJY0516 in #109
[CI] Fix ci by @ZJY0516 in #110
[Docs] Add qwen image missing doc in user guide by @SamitHuang in #111
[Bug-fix] Fix Bugs in Qwen3/Qwen2.5 Omni Rebased Support by @tzhouam in #114
[Bugfix] Remove mandatory flash-attn dependency and optimzie docs by @Gaohan123 in #113
[Feat] Add NPU Backend support for vLLM-Omni by @gcanlin in #89
[Feature] Support Gradio Demo for Qwen3-Omni by @SamitHuang in #116
[Feat] Enable loading local Qwen-Image model by @gcanlin in #117
[Bugfix] Fix bug of online serving for qwen2.5-omni by @Gaohan123 in #118
[Doc] Fix readme typos by @hsliuustc0106 in #108
[Feat] Rename AsyncOmniLLM -> AsyncOmni by @congw729 in #103
[Bugfix] Fix Qwen-omni Online Inference Bug caused by check_stop and long sequence by @SamitHuang in #112
[Fix] Resolve comments & update vLLM-Omni name usages. by @congw729 in #122
Refresh supported models and address nits in doc by @Yikun in #119
[Doc] Cleanup non-english comments by @ywang96 in #125
[Doc] Fix outdated CONTRIBUTING link by @DarkLight1337 in #127
[Misc] Update default stage config for qwen3-omni by @ywang96 in #124
[Doc] Cleanup reference to deleted files by @ywang96 in #134
[Doc] Fix arch pic reference by @ywang96 in #136
[Bugfix] Fix redundant shm broadcast warnings in diffusion workers by @SamitHuang in #133
Update README with vllm-omni blogpost link by @youkaichao in #137
[Bugfix] Fix the curl bug of qwen3-omni and doc errors by @Gaohan123 in #135
[Doc] Update developer & user channel by @ywang96 in #138
[Misc][WIP] Support qwen-omni online inference with local video/audio/image path by @SamitHuang in #131
[Doc] Logo by @ywang96 in #143
[Misc] Misc description updates by @ywang96 in #146
[Bugfix] Fix Qwen3-Omni gradio audio input bug by @SamitHuang in #147
[Bugfix] Add Fake VllmConfig on NPU and add slicing/tiling args in Qwen-Image by @gcanlin in #145
[Misc] Temporarily support downloading models from ModelScope by snapshot download by @MengqingCao in #132
[Misc] update image reference for PyPI by @ywang96 in #150