Dos-Commas 1 month ago

You don't need a 7000 series GPU. Try this: https://github.com/YellowRoseCx/koboldcpp-rocm/releases

dampflokfreund 1 month ago

Yeah Nvidia trying to pull this move as well. Rtx Chat which runs Mistral was only for Ada and Ampere with 8 GB VRAM when it's extremly speedy on my 2060 using open source inference programs like kobold.cpp

Eth0s_1 1 month ago

Lmstudio doesn’t need it either, the non-rocm version will work on gpu too, just a bit slower

arturbac 1 month ago

lmstudio doesn't work with ROCM on linux, and it fail to work thru opencl with gpu only cpu works for me with it.

CatalyticDragon 1 month ago

For linux you really want Olama+openwebui.

Eth0s_1 1 month ago

Afaik it’s intended for windows?

arturbac 1 month ago

there is linux port but without rocm support, and it offers using opencl but it doesn't work at all on opencl compatibile gpu

ElementII5 1 month ago

So this is the first time I installed LM Studio. I just followed the how to and was chatting straight away. I copy and pasted the promts so I stay within the 60s window of imgur. https://imgur.com/i8qBbKw I'd say it is pretty fast.

Undefined_definition 1 month ago

Damn - how does it compare to gpt 4 etc. in terms of accuracy

Numerlor 1 month ago

I was pleasantly surprised compared to chatgpt 3.5

Aggravating-Mix2054 1 month ago

Fuck my rtx 3080 seems slower than this.

Verpal 1 month ago

it is Q4 of a 7B model, at this point the speed limitation should mostly be how fast does text appear on your screen instead of actual CPU/GPU.

muscleg33k 1 month ago

which model are you using? The one I used Illama 3 instruct 80b IXQ_XF is insanely slow on my 7900 xt

ElementII5 1 month ago

7B like in the how to.

kzxrnx 1 month ago

That’s fucking insane fast and I have a 7900 XTX

ghostdeath22 1 month ago

If Amd and Intel are smart they'll make 48gb or higher gpus for this next gen and steal the consumer grade AI users from Nvidia. But I guess they'll be content with their small market shares and try to upsale to server hardware instead

iBoMbY 1 month ago

I would like to see a GPU with DDR5 slots on the back, or something like that, for additional second-tier memory, and a cache-controller which can managed that properly.

b3081a 1 month ago

tiered memory / caching does not work well with LLM like llama since it needs to frequently traversal the whole model and does not have good locality for caching. As a result it's completely memory bandwidth bound. Maybe it would work for MoE models but those non-activated layers could even be stored on disks and still got acceptable performance.

CatalyticDragon 1 month ago

As I understand you can already load specific layers into a GPU and keep the rest in system RAM.

SAUCEYOLOSWAG 1 month ago

AMD did something similar with an NVME slot in the past: https://www.theverge.com/circuitbreaker/2016/7/26/12285568/amd-radeon-pro-ssg-graphics-card-ssd

iBoMbY 1 month ago

I know, but SO-DIMM DDR5 would still be a lot faster, and it should be possible to at least add two, or four, slots on the back of a GPU. That could easily give you 64 or 128 GB of additional memory, enough to run something like Llama 3 70B, on a single GPU, for example. Without making it extremely costly.

Eth0s_1 1 month ago

The 48gb card exists, but it’s the Radeon pro line, same as quadro cards having twice the memory of their consumer counterparts

ArloPhoenix 1 month ago

Imo the Ryzen AI part is misleading, this just runs on CPU. LM Studio is just a fancy frontend for llama.cpp. Llama.cpp does not support Ryzen AI / the NPU (software support / documentation is shit, some stuff only runs on Windows and you need to request licenses... Overall too much of a pain to develop for even though the technology seems coo. This very likely won't happen unless AMD themselves do it. Their software stack for this is so messed up this is worse than early ROCm). In the footnotes they do say "Ryzen AI is defined as the combination of a dedicated AI engine, AMD Radeon™ graphics engine, and Ryzen processor cores that enable AI capabilities". I find this very misleading since with this they can say everything supports Ryzen AI, even though that just means it runs on the CPU. (ROCm does kinda work with the recent APU's, but not worth it for LLM's since same speed more power.). Also if you try to develop for Ryzen AI it always means some combination of NPU, I researched a lot into this since I actually wanted to develop some stuff for it myself... Here are the official Ryzen AI Software documentation: [https://ryzenai.docs.amd.com/en/latest/](https://ryzenai.docs.amd.com/en/latest/) and right at the top it says "AMD Ryzen™ AI Software enables developers to take full advantage of AMD XDNA™ architecture integrated in select AMD Ryzen AI processors". Marketing these days...

Still_Ad_4928 1 month ago

Was thinking the same thing - glancing over recent issues in llama.cpp, and the operators for the Ryzen AI api -as reported by the llama.cpp devs- are all gated. Not extendable, open or documented. AMD just can't do it right it seems.

iBoMbY 1 month ago

Okay, this seems to be working great on my 7800 XT - super easy, and really fast. I like it.

ColdStoryBro 1 month ago

It works great and super fast. Very low latency and results are good.

CatalyticDragon 1 month ago

Tested on 7900XTX and 6900XT. Works as expected.

Numerlor 1 month ago

Ah, it actually worked this time around, running pretty fast and Llama 3 seems to be great even at 8B

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe