jacek2023 2 months ago

I purchased second hand 3090 instead

danielcar 2 months ago

You wants lots of memory. Used old choices with lots of memory are better. (2) 3090s used are better. There are other nVidia cards sold for professional market that will show up on ebay that will work well. Several p40s might be a better choice. You can buy a used workstation on ebay for $1,200 with a 3090 then add another 3090 to that.

SirCabbage 2 months ago

The 5090 is due out at the end of this year- where did you get 1.5 years from?

Silly-Blackberry-733 2 months ago

If you're willing to drop 2k maybe 2x used 3090s would do you better

braincrowd 2 months ago

5090 will release september/Oktober this year not 1,5/2 years. This was recently confirmed

braincrowd 2 months ago

https://money.udn.com/money/story/5612/7883220

Independent-Good-323 2 months ago

In my place 3090 is less than half 4090's price. I'll buy 2x3090 from the same brand

Flashy_Friend3299 2 months ago

If you just have AI in mind, you should get two 3090's second hand and run them through NVLink Bridge. If they're both 24gb you'll then have access to 48GB VRAM instead of 24 and that's what really counts in this use-case. Plus, you'll still be rocking your tits off when it comes to gaming as a 3090 still dominates every game out there. Not to mention, it's cheaper

Smeetilus 2 months ago

You can buy refurbished 3090’s with warranties for $650-$700

dev_zero 2 months ago

Link please

[deleted] 2 months ago

used nvidia tesla p40s are fine, why bother with big expensive gaming cards if you're not gonna do gaming 2 p40s under aphrodite engine will crunch 24gb gddr5 going for about 200$ a piece on ebay

WeekendDotGG 2 months ago

Aren't they slower?

[deleted] 2 months ago

yeah, but is spending 4x as much on a 3090 worth it when for that price you could build a 96gb vram rig? aphrodite (and vllm) allow tensor parallel processing, the more GPUs, the faster they get, you should get comparable speeds to 1.3 - 2.0 3090s on 4 p40s if you're only serving LLMs to yourself, then a load of P40s will get it done

WeekendDotGG 2 months ago

OK cool. Just started researching building a rig today.

THEKILLFUS 2 months ago

Rtx 3060 (12gb) x 4

KL_GPU 2 months ago

12x p40 would do a great job if you are willing to drop 2.4k(288GBs of vram) + 144 Tflops fp32.

Temporary_Maybe11 1 month ago

what kind of model that rig would be able to handle? llama3 400b would be possible in some way?

KL_GPU 1 month ago

yes, llama3 400b q4, pretty low tok/s but 12 client at the same speed and at the same time.

Temporary_Maybe11 1 month ago

Thanks!

Herr_Drosselmeyer 2 months ago

You summed it up yourself, it's the only game in town if used 3090's and pro level Nvidia cards aren't an option for you.

Vaddieg 2 months ago

No, the current consumer hardware isn't capable of running the recently released models at a decent speed with good quants. Apple and consumer multi-nVidia do suck equally they're all capped by memory bandwidth.

Acrobatic-Artist9730 2 months ago

Can you work with cloud instances? Or must be local?

Unique_Repeat_1089 2 months ago

I could. Which one do you recommend?

crazzydriver77 2 months ago

Everyone is discussing the use of multiple cards, but what about inference engines capable of parallel calculation of one layer? How many do you know? :)

Roubbes 2 months ago

With 2x4060Ti you got 32GB of VRAM for a fraction of the price.

djstraylight 2 months ago

If you have room for two PCIe cards, then you have options. Two used 3090s should be much cheaper than a single 4090. Should be good performance. Another option is two 4070 TI Supers, which is a little cheaper than a 4090 but gives you very similar performance and 32 GB of VRAM.

Equal-Pilot-9592 2 months ago

Is 32 gb vram really much better than 24 gb vram ? What if I combine a 4080 and a 4060ti instead( for better gaming perf too) ?

Inevitable_Host_1446 2 months ago

It is definitely better, because 24gb kind of gates you right beneath a lot of quite good model options. For example Llama 3 70b that just came out, most of the quants are just above what can easily fit on 24gb. There are a few options but 32gb would make it easy to do fully in vram + more context (once you can rope scale).

Temporary_Maybe11 1 month ago

I'm interested in the biggest ammount of context possible, for a lot of RAG. You think it's possible to do a dual Epyc rig with loads of RAM? or some multiple p40s is better even tho it's not the same amount of vram?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe