Update device/nvidia-agx-thor/nvidia-diffusiongemma-26B-A4B-it-NVFP4.md

This commit is contained in:
Kyush 2026-06-22 11:08:00 +00:00
commit df42d35102

View file

@ -1,4 +1,6 @@
- 20260622
- 20260622 only for vllm/vllm-openai:gemma container
## serve
```bash
sudo docker run --rm -it --name=vllm-diffusiongemma --gpus all --runtime=nvidia --ipc=host --network host \
@ -31,4 +33,31 @@ vllm/vllm-openai:gemma \
--override-generation-config '{"max_new_tokens": null}' \
--mm-processor-kwargs '{"max_soft_tokens": 1120}' \
--limit-mm-per-prompt '{"image": 7}'
```
## bench
```bash
vllm bench serve \
--model "/workspace/thor-wm/nvidia-diffusiongemma-26B-A4B-it-NVFP4" \
--served-model-name "nvidia/diffusiongemma-26B-A4B-it-NVFP4" \
--host localhost \
--port 8002 \
--dataset-name random \
--random-input-len 1024 \
--random-output-len 1024 \
--num-prompts 5 \
--max-concurrency 1
```
```bash
vllm bench serve \
--model "/workspace/thor-wm/nvidia-diffusiongemma-26B-A4B-it-NVFP4" \
--served-model-name "nvidia/diffusiongemma-26B-A4B-it-NVFP4" \
--host localhost \
--port 8002 \
--dataset-name random \
--random-input-len 1024 \
--random-output-len 1024 \
--num-prompts 32 \
--max-concurrency 8
```