The HF demo space was overloaded, but I got the demo working locally easily enou...

thedangler · 2026-01-22T20:57:46 1769115466

How did you do this locally? Tools? Language?

magicalhippo · 2026-01-23T00:01:16 1769126476

I just followed the Quickstart[1] in the GitHub repo, refreshingly straight forward. Using the pip package worked fine, as did installing the editable version using the git repository. Just install the CUDA version of PyTorch[2] first.

The HF demo is very similar to the GitHub demo, so easy to try out.

  pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
  pip install qwen3-tts
  qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-Base --no-flash-attn --ip 127.0.0.1 --port 8000

That's for CUDA 12.8, change PyTorch install accordingly.

Skipped FlashAttention since I'm on Windows and I haven't gotten FlashAttention 2 to work there yet (I found some precompiled FA3 files[3] but Qwen3-TTS isn't FA3 compatible yet).

[1]: https://github.com/QwenLM/Qwen3-TTS?tab=readme-ov-file#quick...

[2]: https://pytorch.org/get-started/locally/

[3]: https://windreamer.github.io/flash-attention3-wheels/

dur-randir · 2026-01-23T05:58:56 1769147936

https://github.com/sdbds/flash-attention-for-windows/release... - FA2 binaries for you

regularfry · 2026-01-23T11:27:02 1769167622

It flat didn't work for me on mps. CUDA only until someone patches it.

magicalhippo · 2026-01-23T12:05:43 1769169943

Demo ran fine, if very slowly, with CPU-only using "--device cpu" for me. It defaults to CUDA though.

Try using mps I guess, I saw multiple references to code checking if device is not mps, so seems like it should be supported. If not, CPU.

dsrtslnd23 · 2026-01-22T22:53:01 1769122381

Any idea on the VRAM footprint for the 1.7B model? I guess it fits on consumer cards but I am wondering if it works on edge devices.

magicalhippo · 2026-01-22T23:57:54 1769126274

The demo uses 6GB dedicated VRAM on Windows, but keep in mind that it's without FlashAttention. I expect it would drop a bit if I got that working.

Haven't looked into the demo to see if it could be optimized by moving certain bits to CPU for example.