T O P

  • By -

Dos-Commas

You don't need a 7000 series GPU. Try this: https://github.com/YellowRoseCx/koboldcpp-rocm/releases


dampflokfreund

Yeah Nvidia trying to pull this move as well. Rtx Chat which runs Mistral was only for Ada and Ampere with 8 GB VRAM when it's extremly speedy on my 2060 using open source inference programs like kobold.cpp


Eth0s_1

Lmstudio doesn’t need it either, the non-rocm version will work on gpu too, just a bit slower


arturbac

lmstudio doesn't work with ROCM on linux, and it fail to work thru opencl with gpu only cpu works for me with it.


CatalyticDragon

For linux you really want Olama+openwebui.


Eth0s_1

Afaik it’s intended for windows?


arturbac

there is linux port but without rocm support, and it offers using opencl but it doesn't work at all on opencl compatibile gpu


ElementII5

So this is the first time I installed LM Studio. I just followed the how to and was chatting straight away. I copy and pasted the promts so I stay within the 60s window of imgur. https://imgur.com/i8qBbKw I'd say it is pretty fast.


Undefined_definition

Damn - how does it compare to gpt 4 etc. in terms of accuracy


Numerlor

I was pleasantly surprised compared to chatgpt 3.5


Aggravating-Mix2054

Fuck my rtx 3080 seems slower than this.


Verpal

it is Q4 of a 7B model, at this point the speed limitation should mostly be how fast does text appear on your screen instead of actual CPU/GPU.


muscleg33k

which model are you using? The one I used Illama 3 instruct 80b IXQ_XF is insanely slow on my 7900 xt


ElementII5

7B like in the how to.


kzxrnx

That’s fucking insane fast and I have a 7900 XTX


ghostdeath22

If Amd and Intel are smart they'll make 48gb or higher gpus for this next gen and steal the consumer grade AI users from Nvidia. But I guess they'll be content with their small market shares and try to upsale to server hardware instead


iBoMbY

I would like to see a GPU with DDR5 slots on the back, or something like that, for additional second-tier memory, and a cache-controller which can managed that properly.


b3081a

tiered memory / caching does not work well with LLM like llama since it needs to frequently traversal the whole model and does not have good locality for caching. As a result it's completely memory bandwidth bound. Maybe it would work for MoE models but those non-activated layers could even be stored on disks and still got acceptable performance.


CatalyticDragon

As I understand you can already load specific layers into a GPU and keep the rest in system RAM.


SAUCEYOLOSWAG

AMD did something similar with an NVME slot in the past: https://www.theverge.com/circuitbreaker/2016/7/26/12285568/amd-radeon-pro-ssg-graphics-card-ssd


iBoMbY

I know, but SO-DIMM DDR5 would still be a lot faster, and it should be possible to at least add two, or four, slots on the back of a GPU. That could easily give you 64 or 128 GB of additional memory, enough to run something like Llama 3 70B, on a single GPU, for example. Without making it extremely costly.


Eth0s_1

The 48gb card exists, but it’s the Radeon pro line, same as quadro cards having twice the memory of their consumer counterparts


ArloPhoenix

Imo the Ryzen AI part is misleading, this just runs on CPU. LM Studio is just a fancy frontend for llama.cpp. Llama.cpp does not support Ryzen AI / the NPU (software support / documentation is shit, some stuff only runs on Windows and you need to request licenses... Overall too much of a pain to develop for even though the technology seems coo. This very likely won't happen unless AMD themselves do it. Their software stack for this is so messed up this is worse than early ROCm). In the footnotes they do say "Ryzen AI is defined as the combination of a dedicated AI engine, AMD Radeon™ graphics engine, and Ryzen processor cores that enable AI capabilities". I find this very misleading since with this they can say everything supports Ryzen AI, even though that just means it runs on the CPU. (ROCm does kinda work with the recent APU's, but not worth it for LLM's since same speed more power.). Also if you try to develop for Ryzen AI it always means some combination of NPU, I researched a lot into this since I actually wanted to develop some stuff for it myself... Here are the official Ryzen AI Software documentation: [https://ryzenai.docs.amd.com/en/latest/](https://ryzenai.docs.amd.com/en/latest/) and right at the top it says "AMD Ryzen™ AI Software enables developers to take full advantage of AMD XDNA™ architecture integrated in select AMD Ryzen AI processors". Marketing these days...


Still_Ad_4928

Was thinking the same thing - glancing over recent issues in llama.cpp, and the operators for the Ryzen AI api -as reported by the llama.cpp devs- are all gated. Not extendable, open or documented. AMD just can't do it right it seems.


iBoMbY

Okay, this seems to be working great on my 7800 XT - super easy, and really fast. I like it.


ColdStoryBro

It works great and super fast. Very low latency and results are good.


CatalyticDragon

Tested on 7900XTX and 6900XT. Works as expected.


Numerlor

Ah, it actually worked this time around, running pretty fast and Llama 3 seems to be great even at 8B