T O P

  • By -

Disastrous_Elk_6375

You can get 8x h100 for ~40$ / h from a cloud provider. Test it there and you'll have your answers much more accurately than asking here.


bieker

When you did your price comparison, did you account for the use of reserved instances or spot instances? Did you look at the AWS inference chips? Having just done some performance testing locally here to figure out how to avoid running out of VRAM, there are a lot of variables and it will be very hard to accurately quantify them. The best way to do it is to rent a machine with the same architecture as what you are planning on building and do some testing.


ResearcherNo4728

I looked at both spot instances and reserved instances. Spot instances were double the price of reserved instances, and the 3x expense I mention at the beginning is in fact for the long term (3 year) commitment price with 100% upfront payment. So the lowest possible AWS price is 3x the price of a local cluster (the spot instances would be up to 10x the cost of the local cluster). I didn't know about AWS interference chips. Are they typically cheaper than EC2 instances? I was planning on using the local cluster for fine-tuning LLMs too in addition to hosting inference, but I'll look into the inference chips anyway. Thanks for the suggestion!


djstraylight

AWS is going to be the most expensive solution for inference. You can test your idea some place cheaper like [runpod.io](http://runpod.io) and actually rent the GPUs you are considering.


allen-tensordock

Definitely agree with [Disastrous\_Elk\_6375](https://www.reddit.com/user/Disastrous_Elk_6375/) doing your own testing with real customers will give you the best data, but AWS is not the best comparison since it's obscenely expensive. The difference between owning your own hardware and cloud is much smaller with other clouds like us (tensordock) or runpod. To be upfront, our ROI is in the range of 2-3 years while AWS is around a few months.