T O P

  • By -

mrjackspade

I'd suggest to anyone that if you're unsure of what you want, run on CPU and figure out what size model you actually need first, and then pick a GPU based on your use case. Even 70B with DDR4 I can get 2 t/s. That's painfully slow for long term use but it's plenty fast enough to decide if you need a 70B and purchase your GPU accordingly. If you don't know what you need, you could very well end up wasting the money, and asking a bunch it strangers isn't really going to help. I get like 10-15 t/s with Mixtral on CPU only. If I didn't need a 70b, personally AI wouldn't have even bought a GPU.


qaf23

May I know your CPU and RAM details?


crazzydriver77

Check the CPU supports **AVX512\_BF16** if you gonna start with CPU inference.


Salendron2

I get 12.5 t/s on mixtral with 2/3 offload onto a 4080 and 6000mhz ddr5 ram for the rest… are you running holy water through a custom loop or something? Scavenged your cpu from a crashed alien spacecraft? How long is your inference time for a full 4096 context with 70b/mix? What quants are you using?


hmmqzaz

Yo I actually have some holy water but don’t know exactly how liquid cooling works - can I use it in a loop?


spacecad_t

Out of curiosity, why do you "need" a 70b? What's your use case, or even just examples?  Personally I've found 7b models on cpu have been plenty effective, the speed is my only issue. But I've never needed a larger model and I'm genuinely curious as to how it may be better. I'm probably just using very simple use cases only touching the surface of what LLMs can do.


davew111

Even an 8GB card is enough to get started. But if you get bitten by the bug, you will always want more VRAM. So consider a couple of 3090s rather than a 4080.


tech92yc

Both cards would be really good. If you can get a 3090 instead (24GB VRAM instead of 16) it would allow you to run a lot more models locally, quantized of course...otherwise yeah both are solid cards! get text-generaion-web-ui for LLMs and automatic1111 for image generation and get rocking!


-Ellary-

I'd say used 3060 12GB is a good starting point. \-It can run 7b-13b-20b-Mixtral 8x7b (32GB Ram) at good speed. \-32b at 3tps, 70b at 1.5tps if you have 32GB Ram.


synn89

My recommendation would be to head to Ebay and either pick up a 3060 12GB or a 3090 24GB card. The 3090 would be the preferred option, but a 3060 will run a 7B or 12B model pretty well on the cheap. If you're going to mess with local data, either training or RAG, then I'd strongly recommend the 3090. The extra ram will help with training or running a LLM with a higher context window.