Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I see a lot of references to `device_map="cuda:0"` but no cuda in the github repo, is the complete stack flash attention plus this python plus the weights file, or does one need vLLM running as well?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: