Problem
I'm trying to run vLLM-Omni (v0.11.0rc1) with the Qwen2.5-Omni-7B model on an NVIDIA A100 GPU, but initialization fails with two critical errors in spawned worker processes:
NVML Invalid Argument Error in one worker:
vllm.third_party.pynvml.NVMLError_InvalidArgument: Invalid Argument
This occurs at:
handle = pynvml.nvmlDeviceGetHandleByIndex(physical_device_id)
- V1 Engine Mismatch Error in other workers:
ValueError: Using V1 LLMEngine, but envs.VLLM_USE_V1=False.
This should not happen. As a workaround, try using LLMEngine.from_vllm_config(...)
or explicitly set VLLM_USE_V1=0 or 1 and report this issue on Github.
All 3 spawned processes fail, causing the orchestrator to timeout:
WARNING: [Orchestrator] Initialization timeout: only 0/3 stages are ready; not ready: [0, 1, 2]
Environment
GPU: NVIDIA A100-SXM4-40GB
vLLM: 0.11.0
vLLM-Omni: 0.11.0rc1
Python: 3.10
PyTorch: CUDA available in main process
Multiprocessing: spawn method
Environment variables set in bash:
CUDA_VISIBLE_DEVICES=0VLLM_USE_V1=0VLLM_WORKER_MULTIPROC_METHOD=spawn
Code
import os
import soundfile as sf
import torch
def main():
from vllm_omni.entrypoints.omni_llm import OmniLLM
from vllm.sampling_params import SamplingParams
print("=== Starting vLLM-Omni Test ===")
print(f"Environment: VLLM_USE_V1={os.environ.get('VLLM_USE_V1', 'NOT SET')}")
print(f"PyTorch CUDA: {torch.cuda.is_available()}, Devices: {torch.cuda.device_count()}")
audio_path = "/scratch/users/ntu/es0001an/dataset_generated/001_input.wav"
os.makedirs(os.path.dirname(audio_path), exist_ok=True)
if not os.path.exists(audio_path):
sf.write(audio_path, torch.zeros(16000).numpy(), 16000)
print(f"Created dummy audio at {audio_path}")
print("\n=== Initializing OmniLLM ===")
engine = OmniLLM(
model="Qwen/Qwen2.5-Omni-7B",
trust_remote_code=True,
dtype="bfloat16",
runtime={"devices": [[0], [0], [0]]},
init_sleep_seconds=180,
max_model_len=2048,
disable_custom_all_reduce=True,
enforce_eager=True,
)
prompt = {
"prompt": (
"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n"
"<|im_start|>user\n<|audio_bos|><|AUDIO|><|audio_eos|>\n"
"Describe this audio in detail.<|im_end|>\n<|im_start|>assistant\n"
),
"multi_modal_data": {"audio": [audio_path]}
}
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
sampling_params_list = [sampling_params, sampling_params, sampling_params]
print("\n=== Generating Response ===")
try:
results = engine.generate([prompt], sampling_params_list)
if results and len(results) > 0:
result = results[0]
print(f"\n{'='*60}")
print("SUCCESS!")
print(f"{'='*60}")
print(result)
if hasattr(result, 'outputs') and result.outputs:
for idx, output in enumerate(result.outputs):
if hasattr(output, 'text') and output.text:
print(f"\nText: {output.text}")
if hasattr(output, 'audio') and output.audio is not None:
audio_file = f'output_{idx}.wav'
sf.write(audio_file, output.audio, 24000)
print(f"Audio saved to: {audio_file}")
else:
print("No results returned")
except Exception as e:
print(f"Error: {e}")
import traceback
traceback.print_exc()
if __name__ == '__main__':
main()
What I've Tried
Setting
VLLM_USE_V1=0in bash script (not Python) - still failsUsing single GPU with
runtime={"devices": [[0], [0], [0]]}Verified PyTorch can access GPU in main process
Added
enforce_eager=Trueanddisable_custom_all_reduce=TrueSetting environment variables in Python with
os.environ- doesn't propagate to spawned children
Questions
Why does NVML fail to get GPU handle in spawned processes when
CUDA_VISIBLE_DEVICES=0is set and the main process can access the GPU fine?Why does vLLM-Omni use V1 LLMEngine despite
VLLM_USE_V1=0being explicitly set in the shell environment?Is this a known bug in vLLM-Omni 0.11.0rc1, or is there a correct way to configure multi-stage initialization?
Should I try:
Setting
VLLM_USE_V1=1instead?Using
forkinstead ofspawn?
Any insights on resolving these multiprocessing/GPU initialization issues would be greatly appreciated!