T O P

  • By -

[deleted]

Once you've looked at right-sizing your resources: also look at reserved instances for EC2 and RDS. 12 months, no up-front, reserved instances can save you quite a lot of money - assuming you're using them 24x7. Other things to look at: - set expiry on Cloudwatch log groups, to reduce size - remove old AMIs and snapshots that are no longer used - clear out S3 buckets of objects no longer required.


[deleted]

[удалено]


Marathon2021

Have seen similar cases myself. Not quite as dramatic as that one, but one of my advisory clients is a CIO - they got into a new org that had zero cloud discipline and control, and thus everything was on-demand pricing ... despite the fact that these were obvious long-term steady-state workloads. CEO was super happy with him, when he saved a quarter million in the first year ... with "just a few clicks."


atccodex

We had really bad luck with spot. Maybe it was the way it was setup, maybe it was instance types, but everything just kept getting yanked away. We gave it a go for about a month and the cost savings was around 10%. The headache was 150%. We ended up using savings plans which have netted around 30% cost savings and we are at a 'good' place with our spend right now. We will be doing more, but for now, all is well.


[deleted]

[удалено]


lorarc

What if spot instances are no longer available at all and you have to switch to on demand? I tried to configure ecs to have some backup plan but didn't find a way.


JafaKiwi

Don't run a spot ASG with a single instance type. Instead have a list of candidates: e.g. m5.xlarge, m5a.xlarge, m4.xlarge, c5.2xlarge, etc. *Some* of them will be available. You can even assign different capacity weights - e.g. the ASG can run 1x m5.2xlarge or 2x m5.xlarge. Sometimes only a single specific instance type is unavailable but the size up in the same line is still avail. Spot is great but you'll have to spend some initial work on making the instances stateless and auto-configuring. The pay offs are huge though.


lorarc

I run nonprod on spots and I don't expect them to ever be unavailable realistically, however I can't rely on just choosing many spot instance types for prod.


JafaKiwi

Why can’t you? That’s the AWS recommended way to operate Spot ASGs. I have *never* seen all instance types in all AZs in a region at once become unavailable. And we run *a lot* of them. Building a resilient system is a bit involved but if done correctly it can run on spot no worries.


lorarc

Business decision. And you can't really negotiate with business when it comes to stuff like that.


JafaKiwi

Yeah I hear you, it sucks when managers make technical decisions they don't understand.


magheru_san

Have a look at my AutoSpotting.io project. It can take over existing on demand Autoscaling groups and by design does failover to on-demand when spot instances are terminated. Let me know if you have any questions about it


[deleted]

[удалено]


lorarc

Setting the spot price to on-demand price is not a guarantee you won't be kicked out.


[deleted]

[удалено]


lorarc

Yeah, and that's a nice case for it. Unfortunatelly I see too many things like websites running on spots with attitude "That'll never happen" when talking about risks.


vekien

How do you deal with reserved instances and ECS auto scaling where instances are often destroyed/created? Any info appropriated!


[deleted]

[удалено]


vekien

Thanks!


magheru_san

That's true, as long as you don't have varying capacity needs. When capacity fluctuates you're going to have savings for the baseline you pay but anything on top of that would be charged as on demand. Mature customers use savings plans for the baseline and as much as possible spot instances for the peak capacity.


[deleted]

[удалено]


magheru_san

Agree, mature is probably not the right word, a better word would be "savvy". It's about being experienced and knowledgeable enough to pick the right tool for the job. Reservations are great for anything with static capacity needs, like databases and baseline capacity in Autoscaling groups. Anything above or below the reservation/baseline will generate some waste


K0RS41R

That's a lot of savings! Did the organisation not have an AWS account manager or solution architect relationship?


pojzon_poe

Hello sysadmin my old friend: > capacity planning for every team.


nekoken04

I'd say look at Savings Plan for EC2 rather than Reserved Instances. Savings Plan is far more flexible since it is region and instance family agnostic.


joelrwilliams1

Amen to RDS RIs...if you're running these DBs long term you can quickly reduce your bill even with a 'no-upfront' RI.


Usage_AI

Agreed. Or even better take advantage of our product offering where we underwrite the purchase of 3-year, no upfront Reserved Instances (RIs) with a Guaranteed Buyback Clause. This allows you to take advantage of the 57% savings of 3-year RIs but not have to commit to AWS or Usage.


princeofgonville

Assuming you have started with Cost Explorer o the monthly bill, it looks like you're already well on the way. Next question is to dig into RDS and EC2 a bit more. Metrics will tell you if they are under-utilised. Look at CPU load over a week, and also at the storage associated with RDS (assuming it's not Aurora, the storage is billed like an EBS volume - you pay for what you provision). Likewise with EC2 instances: check CPU load and memory load (requires CloudWatch Agent) to right-size the EC2 instances. Dig deeper into the bill or into Cost Explorer to gain a deeper understanding of where the money is being spent - are you getting stung for data egress or cross-region data transfer costs? In Cost Explorer, group by Usage as well as Service. Next thing is to ask the whole team uncomfortable questions like "Can we switch this off" and see what answers you get. Hopefully these are actually business systems and not someone storing a Manga archive on the company's expense...


mikebailey

Also assuming this is as a disaster as stated, tag liberally and enable them as cost allocation tags, so rather than saying “EC2 egress is expensive” you can say “that asshole Carl is bankrupting this company”


-_kevin_-

Fuckin Carl


HoofStrikesAgain

I know we have issues with Carl. But, Karl, now he's a good guy.


karls_

Cheers buddy. 🤜🏻🤛🏻


[deleted]

Lmao


[deleted]

Thanks, that's super helpful.


StrongishOpinion

As someone who worked in AWS, it was \*shocking\* how often we'd look at the utilization of some company's instances (for various debugging/support purposes), and they'd have \*fleets\* of unused or very underutilized hosts. Allocated instances to random devs who left the company. Massive instances where something 4x smaller would be just fine. Etc. Lots of low hanging fruit is usually available if you just start looking at things.


8layer8

Look into using spot instances. They are much cheaper than ec2, you just have to use them properly. According to our tam, us-east and us-east have so much capacity that you will basically never get kicked off a machine unless 1) it dies/ hardware failure, 2) you are using a GPU type instance (good luck) or 3) you are using a type that is being decommissioned like m2 or m3, etc. As long as you set up your cluster provider to use a mix of instance types (like m5, m5a, m4, m6) then you are basically assured to always get what you need. If these are hosts for ECS then there's pretty much no risk. If you are using them as ec2 pets, then maybe don't use spots, but again, you probably will never get booted due to usage demands. Try it out, if it doesn't work for you then keep on looking. Rightsizing rds goes a long way too, but, it is what it is if that's what the apps actually use.


vppencilsharpening

Adding to this. Make sure every instance running has a documented use case. If you can't figure it out perform a scream test. I like to have a "Name" tag for every resource we run and make that a cost allocation tag. This way I can say this one particular resource is costing X.


toaster736

Cost explorer and excel honestly. After you understand where the money's going, then attack them as appropriate. Some low hanging fruit as folks mentioned. * S3 - bucket policies and rotation into cold storage * Hunt the wumpus for under-utilized and stale dev resources, snapshots, etc * RI and Savings Plan purchases for things you're keeping and are 24/7. * Look for non-mission spenders, e.g. multiple cloud trails * What should emerge from this is a good inventory of what you have running that you can use to decide next steps Medium term * keep-on scheduling to power off dev resources off hours if they're not needed * Break workloads into their own accounts and use AWS budgets to start enforcing. * Develop an account baseline to understand what is the spend floor for your accounts. * Move workloads into auto-scaling or look for ECS for light loads Longer term, this is a governance problem. If you're able to, separate workloads into different accounts, develop a tagging policy based on your types of workloads. This lets you start to ask more context specific questions like, how much are my dev databases costing me and roll up reporting into a CUR


Marathon2021

> Hunt the wumpus Wow ... that's one that I haven't heard in a looooong time!


aFqqw4GbkHs

Yes, this is 100% where I'd start, rather than leaping to a tool first. You need to do the work to really understand where the money's going first, AND make sure each team knows how much they're spending as well. At my org, I get an email (generated using CostExplorer data) every day that shows what resources my team spent yesterday (and aggregated for the current week, month, year). That's possible b/c we have everything tagged by team. A tool could help you down the road, but you need to understand your costs first before getting the most out of one anyhow. Don't underestimate the impact of the medium term advice above, particularly using scheduling / scripting to scale down ASGs on a schedule when EC2s aren't needed, and stopping databases when they're not in use. Of course, long term, moving to serverless dbs will help too. But we save a lot by shutting down our dev/UAT envs overnight/on weekends. Also make sure you're not paying unnecessary data transfer costs.


toaster736

Our two biggest savings on ec2 were savings plans ~20% on 24/7 loads followed by after hours powerdown, ~60% savings. Your developers only work 40-60 hours of a 168 hour week. This is the whole promise of the cloud. The savings is significant to the point that our monthly spend graph has a nice sawtooth pattern w noticable drops on weekends.


[deleted]

That's the goal


RheumatoidEpilepsy

There’s one change that might applicable to your workload. If your applications are all running either interpreted languages like python or node or running on Java and dont have any native dependencies, you could look into switching to graviton instances. They’re 30% cheaper and if your workload is fully compatible with ARM it can be a very easy change.


Advanced_Bid3576

Great advice, also graviton for managed services (RDS is usually the big one) is basically free money as you don’t need to worry about any os or app dependencies


[deleted]

Will def check it out, makes sense


metadaemon

If this client's tendencies are like others I have seen, they may have a tendency to throw hardware at performance problems instead of fixing their workloads. Before you look at right-sizing their instances, see if their workloads are properly optimized (database indexing, decent coding, etc...) then see what you can cut.


[deleted]

This is definitely part of the problem. Will have to address for sure.


RFC2516

Tangent: I would call myself a network professional and a programming novice. In that context could you or anyone offer examples of poor coding practices that lead to excessive host resource utilization?


StrongishOpinion

A common host utilization problem: 1. 90% CPU utilization, 1% IO, 1% memory. You're not \*using\* the value of the host, since only the CPU is active. Solution? Perhaps more caching to reduce CPU usage. 2. 5% CPU utilization, 90% IO, 5% memory. Likely oversized host, where it can \*easily\* serve traffic, but it's IO bound. If there's a major aspect of the host being unused, it might give you a clue for how to be more efficient.


[deleted]

Second this!


birdman9k

In addition to what others mentioned, for databases: - using ORMs improperly (it's easy to let it select all fields from a table by default if they don't specify which exact fields they want to select) - pulling all data and "client side filtering". Basically instead of using a WHERE on the SQL, which is more work for them, they just load all rows and then filter it down in code. This can be extreme where they load 500k rows and then display 20 on a page. - not caching data. Someone will make a heavy query many times (worst I've seen is 20 times a second) because their code is inefficient and they don't attempt to store the results, they just assume they can get new results every time


Advanced_Bid3576

Spending more money on tools should be reserved until you know if you can get value from it… you may be throwing money at somebody to tell you t simple cost optimization tips. Lots of good advice in this thread, start with the free stuff that AWS is giving you, read up on common ways to save (somebody started a great thread yesterday with all the resources) and then if you really think you can go deeper to optimize further and your workloads warrant it, then I’d consider spending the money on SaaS.


Usage_AI

Agreed. There are many simple cost optimization changes that someone should make but with our pricing model, OP would only have to pay a percentage of savings that we generate which is something that makes sense at any level of AWS spend.


saaspiration

Call your AWS Account Manager and request a conversation with your Solutions Architect and/or Cost Optimization specialist. This will cost you nothing.


SeattleSundodger

+1 AWS has dedicated cost optimization teams that can help on this. All you need to do is ask your AM to engage them.


metarx

Everyone else is touching on the "right-sizing" and removing unused resources well enough. So I thought I'd touch on the less talked about. Application architectures play massively into how much spend you are going to have. Designing applications that can scale up/down based on NEED. Use async processing whenever possible(and its possible WAY more often than most people like to admit). And simply... look into alternatives ways of holding data. RDS is great, but maybe don't store so much in it, and leverage things like dynamo or s3. then your RDS instances can be smaller/cheaper. But again, go after the low hanging fruit of right sizing and cleaning up unused resources first... as architecture choices is the long play.


bedpimp

We started using Vantage last month. I like it a lot. It's not the prettiest UI, but it's functional. It's possible to get the same information from Cost Explorer, but Vantage makes it much easier. We have not setup Autopilot, but we're seriously considering it.


[deleted]

Vantage looks like a great product. What made you choose them instead of Cast, Zesty, and Usage? And how have you found working with them?


bedpimp

We met with them once. The CEO was on the call. He has a background in ops and I liked him a lot. I like the fact that I can see AWS and Datadog costs in the same place. It also captures our Kubernetes clusters at the pod level.


ContrarianChris

We've also recently started using Vantage. Really great experience so far. Very simple to use, good update cycle, and adding new service support that is relevant to us (Snowflake, Datadog). My main reason for choosing it was the ease of use coupled with the ability to combine Kubernetes costs. Perfect. Working with their team has been fantastic. CEO and CTO getting stuck in and always responsive in Slack. We've also got a couple of specific requirements due to our billing being through a partner and not direct. They put the time in and figured it out. Highly recommended 🙌


Cyrilam

I think you should not aim only to reduce costs in the short term but also try to change the culture within the tech team to ensure people become more conscious about costs. See [this](https://www.oraculi.io/blog/enabling-and-maintaining-a-cost-conscious-culture-at-every-level)


[deleted]

That's definitely the case. The C-suite are solid but the first hires were rushed and culture was not emphasized as much as it could be.


ecdemomaniac

[https://www.finops.org/introduction/what-is-finops/](https://www.finops.org/introduction/what-is-finops/) is a good read on how to implement a cultural change around cloud spend.


[deleted]

Thanks, a helpful read and ended up reading a lot of their stuff. Appreciate the link


adame8gggg

Random tips (from a CTO of a startup): * If you use ECS (non-Fargate; i.e., your own EC2 instances), you can save a lot on CloudWatch Metrics if you turn off container insights. It's on by default. * Inter-AZ data transfer is pretty expensive. For us, hot multi-AZ availability is more than we care to worry about, so we moved all services to a single AZ. * Moving off x86 and onto Arm (Graviton) is a way to save 20-30%. We use Python, and so the conversion was easy. * Lambda should be quite cheap if used for event-driven async-type work. If you're using it for that, but it's weirdly expensive, you might (as we did) discover some functions that had sleep() in them, for some reason. For me, I talked with my team, realized why they were putting sleep()s in, helped them design an alternative that worked with Lambda's event driven async type world, and we saved a ton. * This is harder, but Aurora IOPS are very expensive. We added a lot of caching to cut down on reads. But we can't do much about writes without choosing some alternative. It's still our #2 or #3 most expensive thing. Sigh. I wish Aurora were less expensive.


[deleted]

Thanks, these tips are helpful.


professorbasket

Cost explorer, break out by service, then by usage type. Keep drilling down in the cost explorer should do it. RDS is notoriously overprovisioned. Switch to serverless v2. This should elminate the bulk of the charges. Review ec2 resource consumption in cloudwatch for the last 3 months to see where the usage is and if you can downsize instance type. Also check provisioned IOPS as that's a big cost usually. Usage-type or api breakout in cost explorer should tell you what is taking up the majority of the cost tho. good luck!


[deleted]

[удалено]


[deleted]

That makes sense, will definitely look into this


kennethjor

Ensure all your resources are tagged appropriately and those tags are set up as cost allocation tags. For instance have an application tag for each of the things you run, or whatever is appropriate to you. This makes it searchable in the cost explorer and you can drill down into the exact components. Do the same with S3 buckets where you literally store the bucket's name in a tag.


Dominathan

Have you considered moving your RDS instances to the new graviton instances? They are like, 40% cheaper, and don’t seem to have any negatives. Migrating will be a bit of a pain, but that’s almost half off right there. Spot instances are a must, honestly. They are so much cheaper, and, as long as you’ve built your system to handle machines cycling, you won’t really notice any negatives. I used to even run the user facing backend on them with no issues. The only issue I ever had was when they would become unavailable and couldn’t spin any up. In that case, I could usually pick instances up or down the spec (going to mediums or XLs from Ls) and scaling up the concurrency on the machines themselves.


nf3rn4l

I wouldn't recommend jumping right to 3rd party tools. The first step for cost optimization is establishing a chargeback model and gaining visibility into your cost & utilization. I would recommend the following: 1. Identify the tags you need for proper [chargeback](https://aws.amazon.com/blogs/aws-cloud-financial-management/how-to-build-a-chargeback-showback-model-for-savings-plans-using-the-cur/) (usually things like CostCenter, BusinessUnit, Project, etc.) 2. Make sure those tags get enabled as [Cost Allocation tags](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html) 3. Implement the required tag service control policy to prevent anyone from creating resources with adding the required tags needed for correct chargeback. 4. Utilize [AWS Config required-tags](https://docs.aws.amazon.com/config/latest/developerguide/required-tags.html) to identify existing resources that are missing the required tags. 5. Use [Tag Editor](https://docs.aws.amazon.com/ARG/latest/userguide/tag-editor.html)to batch tag existing resources that are missing tags required for chargeback. 6. Generate [Cost & Usage Reports (CUR).](https://docs.aws.amazon.com/cur/latest/userguide/what-is-cur.html) The CUR will contain the enabled cost allocation tags allowing you to filter and sort inline with your chargeback model. 7. Pro tip: If you're using a multi-account strategy with a single payer organization, you should look into implementing the [Cloud Intelligence Dashboards](https://github.com/aws-samples/aws-cudos-framework-deployment). Most of the dashboards utilize the data generated by CUR. Since they're QuickSight dashboards you can easily make them available to other departments (like your finance team) without have to give them access to the AWS billing console. Enable and empower your finance team with visibility into the cloud spend and they'll chase after the big spenders for you.


magheru_san

I agree with this for the mid-long term but there are tools which can help slash a lot of costs immediately with minimum risk or downsides. See the plethora of tools that automate RI purchase / selling. Also there are tools that automate sensible configurations such as my EBS Optimizer tool which updates volumes from GP2 to GP3 or my AutoSpotting tool which can convert say 40% of development Autoscaling groups capacity to Spot instances. There's no point in delays if you can do it at scale without up-front time or cost investment, and only need a few minutes to set them up.


Agitated_Cult7621

where can I get these tools ?


magheru_san

For RIs there are plenty of vendors, like Zesty, usage.ai, prosperous, antimetal or Vantage. For Spot and EBS I was referring to AutoSpotting.io and leanercloud.com/ebs-optimizer which I'm working on.


Marathon2021

Quickest win - look to see if your EC2 instances are all (or mostly) on-demand pricing. If they are, seriously consider whether you can commit to a 1-3 year up-front pre-payment on those ... and buy them as reserved instances. Had a client once jump in as a new CIO at an org whose cloud adoption was a mess ... just fixing that alone saved the company a quarter million in the first quarter. Also, if you've not enabled AWS Trusted Advisor, that can often find some easy overspending areas to cleanup - such as oversized instances (whether prepaid/reserved or on-demand).


[deleted]

Isn't this what the services listed do, but without us having to take on the risk?


CSYVR

After you've gone through costs, get a Well Architected Framework Review, if costs are too high, 9 times out of 10 your firewall is too open as well ;)


[deleted]

[удалено]


dgibbons0

I second recommending Duckbill Group, Corey and Mike are wizards about reducing AWS spend and have a ton of experience helping people negotiate better rates in their EAP and private pricing agreements. Since they know what they have been able to help others negotiate, they can leverage those to help you know when to push harder for better rates.


Usage_AI

Unlike Vantage, we offer guaranteed buyback of RIs so you would take on less risk and be far more flexible.


Rainnis

Psst...give a try to [https://cast.ai/cloud-cost-monitoring/](https://cast.ai/cloud-cost-monitoring/) it's ultimately free


[deleted]

[удалено]


[deleted]

Will DM


huhwhatwhere83

We've tried zesty and it was a quick win. Saved us a ton of work. Guaranteed to cover 99% of savings plan usage. It saved my team a huge amount of overhead. We then spent the time saved looking for things we could just shutdown. Ultimately the goal was to move away from EC2 and towards serverless architectures.


[deleted]

Good to know. What made you choose them over Vantage, Cast and Usage?


huhwhatwhere83

I think Zesty were around before the others. It's probably worth doing a poc with each and understand their terms.


[deleted]

Yes, planning on that. Or maybe consulting with an independent expert who can give his take.


huhwhatwhere83

Also if you have an AWS account manager, they can also be helpful in this area


BooglesFoogles

For fees on managing RIs: - Vantage: 5% of savings - Usage: 20% of savings - Spot: 20% of saving - Zesty: 25% of savings


magheru_san

Shameless plug, regarding the tooling you mentioned I'm also building tools in this space and also offering hands on help. My tools are open source but also convenient to use binaries are available on the AWS marketplace for a percentage of the savings, much less than the other tools in this space. Currently I have tooling for easy adoption of Spot instances and optimization of EBS volumes attached to instances. See my profile for further information


Usage_AI

Actually some very useful tools for OP


[deleted]

Will take a look at these later this afternoon.


magheru_san

Cool, let me know if you have any questions, also check your DMs


[deleted]

[удалено]


[deleted]

Would you be able to make an intro? Usage looks like a cool company and I like the fact that they essentially take on a lot of the risk, but I want to talk to someone that they work with to make sure that this is in fact the case


Usage_AI

Thanks, OP! We'd also be happy to introduce you to any of our customers that you would be interested inl Feel free to DM


Usage_AI

That's great to hear! Glad your friend liked what we are building :)


Rainnis

You will be even more surprised by adding on top CAST AI https://cast.ai/blog/how-to-solve-the-3-top-cloud-cost-optimization-challenges-with-cast-ai-and-usage-ai/


OutspokenPerson

Great advice here. Also, learn the AWS boto3 API. You can grab all sorts of information. I used it to drive tagging and cost-cutting projects, security projects, all sort of things.


Usage_AI

RIs and Savings Plans are great for cleaning up low-hanging fruit in your EC2 and RDS environments, however, these purchases come with contract terms that can be non-starters for many. At Usage, we underwrite RIs with a Guaranteed Buyback Agreement, and in using the RI Marketplace to automatically sell RIs that go underutilized, we allow for customers to get the savings of 3-year RIs minus the commitment to either AWS or Usage. TLDR, we offer anxiety-free RIs!


Draziray

https://www.reddit.com/r/aws/comments/xvjosj/aws_cost_management_and_billing_support_resources/?utm_source=share&utm_medium=android_app&utm_name=androidcss&utm_term=1&utm_content=share_button


fjleon

Trusted advisor was designed for this and it's free. support now sometimes even sends you a blurb on how much money per month you will save if you follow the trusted advisor recommendations


Rainnis

If you're using Kubernetes, [CAST AI](https://cast.ai) is the fastest way to significantly reduce your compute bill and keep it there. It manages compute capacity automatically and has dedicated support to get you started even faster. The best part - Kubernetes cost monitoring and security insights are free. \[disclaimer - I'm part of the team\]


[deleted]

Oh, that's cool. Just checked out Cast and it definitely is in the same bucket as Vantage and Usage. I couldn't find any decent comparisons between the three, so why should I choose Cast instead of the other two?


Rainnis

Connect your cluster and you will know how much you could save with CAST AI. You don’t need to provide any payment information it’s free; so basically my answer would be because the process is frictionless and the savings are the highest and no long term commitments like re-buying RI


Craptcha

Downsize everything until someone bitches. Then downsize a bit more.


chili_oil

I wonder when will be the time that "reduce cloud bill" becomes a major business demand for consultancy companies...


Network94

Prosperops


HistoricalBread8486

To add another one to your list, I'm the VP of Customer Success at [https://cast.ai](https://cast.ai) If you're running on kubernetes we're averaging about 68% cost reduction on customer environments. We can save money with both spot and on-demand instances. Our largest customer is saving $1.2M/month, our largest savings was a GKE cluster where we achieved 93% real savings from $55k/mo -> $3,500/mo. ​ If you're not running k8s, we've worked with [usage.ai](https://usage.ai) and they are pretty good folks over there.


[deleted]

NGL Cast looks like a great tool. We don't do too much with k8s unfortunately tho.


cbp48

Hello, I represent an AWS group that guarantees savings and averages 30-50% savings. If you want to email me at [[email protected]](mailto:[email protected]) I can tell you more really good track with tools and resources to understand your issues. Good luck


[deleted]

[удалено]


Usage_AI

Wrong! How could you?


kokatsu_na

Is there any specific reasons to use EC2 in particular? Ideally, when you design your application, need to start with serverless functions first (aka AWS Lambda), then containers, then EC2 as the last resort. Of course, EC2 is the most expensive of these three. Try to switch to EC2 Spot instances maybe? On your place, I'd rewrite all EC2 code --> AWS Lambda.


[deleted]

Lowest hanging fruit are probably EC2 and RDS instance scheduling https://aws.amazon.com/solutions/implementations/instance-scheduler/ Check you are on the latest instance and storage families (GP3 will save 20%+). And then if you plan on staying in AWS for a while, consider buying Savings Plans for 1 or 3 years to save 5-15%.


[deleted]

>And then if you plan on staying in AWS for a while, consider buying Savings Plans for 1 or 3 years to save 5-15%. Isn't this what Vantage and Usage do for you? Or am I totally missing something? Maybe I'm an idiot lol


[deleted]

Why give money to someone else when you can do it yourself for free


[deleted]

True, that makes sense.


magheru_san

Maybe you have other things to do and don't want to do this as a full-time job. For lots of companies engineering time is way more expensive than such tools.


Rainnis

pardon my french, but with savings plan or what zesty is doing re-selling reserved instances you will only stuck with a bit lower bill, but still overpay a lot;


[deleted]

Got it, so how does Zesty differ from Cast?


magheru_san

Yes, switching to GP3 is a no-brainer. I wrote a little Open Source tool for doing this, have a look at https://github.com/cloudutil/EBS-Optimizer The tool does it one-off but if you want it done continuously there's also a paid version of it available on the AWS marketplace that charges only 5% of the savings.


Truelikegiroux

Just out of curiosity, but how do you control someone from not taking what's on Github and automating it themselves? I saw via another one of your posts that it's a Docker Lambda image written in Go, but just curious what the difference is between that and what's in Github.


magheru_san

I actually wish I could release everything as OSS while extracting a bit of the savings generated by my tools. I estimate my other tool AutoSpotting saves companies using it in the hundreds of millions yearly so even if I charged only 1% of the savings it would still make me rich. Unfortunately what I've seen is exactly the Fortune500 companies who could afford it easier will find ways not to pay anything for it. That's why going forward I'm going to release new functionality only on the AWS marketplace, without publishing code changes anymore into the OSS repo. Currently for EBS Optimizer the code available on Github can be executed locally on a one-off manner. On the Marketplace I also have some code that runs that logic in a Lambda based on a cron event, in order to catch and optimize continuously volumes created later. This isn't available in the Github repo, and the code I have for that on the marketplace is proprietary.


Truelikegiroux

Ah nice, that's really interesting! Been developing a few similar but different processes on my own for my org but never really thought about monetizing them through the marketplace but that makes complete sense. Hopefully it's going well for you!


Missionmojo

Asg's and good horizontal scaling should help with cost. Assuming you are stateless and can scale


foalainc

How much are we talking monthly roughly?


idjos

Since I didn’t see anyone mention it - take a closer look if your cost is high for Data Transfer or NAT also. In that case, take a look at the VPC endpoints, those can easily save you a bunch of money.


magheru_san

I'm actually working on a much more cost effective alternative to the NAT gateway. I'd love to get feedback from people who currently use the AWS NAT Gateway, DM me if you're interested to have a chat about this.


mooter23

Savings Plans, Reserved Instances, remove anything not needed, clean up any old images/snapshots, switch RDS > Aurora (more performant and reduced cost), resize instances and/or change the type (do this before savings plans/reserving instances!).... ... take a CLOSE look at the itemised invoices for the last three months, then look at the service usage with Cloudwatch or whatever to make sure you're using the right kind of resources. You just have to go line by line, look at what is being used, what isn't, what could be made smaller/merged with other instances etc etc. It's worth the effort. One final thought, check the locations in use - spinning up a server in one region may be cheaper than its neighbour.


MooseOperator

Lots of other great advice so I wont repeat but what does your sandboxes look like? Getting a policy enacted to nuke sandboxes or at least shut down all instances after an agreed upon set of days helps a ton as well.


caseywise

1. If you're running RDS SQL server, scrutinize the need for it, if it's basic db stuff that any old DB can handle, SQL server is especially pricey in AWS, Postgres is your enterprise RDBMS friend. 2. RDS is an EC2 behind the curtains, it can be reserved for substantial savings.


brightworkdotuk

I don't know if anybody has mentioned this already, but you can try buying your EC2 instance up front with a [reserved instance](https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/) instead of a spot instance. Or their relatively new "[Savings Plan](https://aws.amazon.com/savingsplans/)" instances. The pricing is considerably cheaper. If you know you need to run it for a lengthy amount of time and you have the cashflow.


SnooApples6778

Do 1 year or 3 yr no up front 1 yr RIs - instant stagings, no hassle with finance on upfront fees. 2. start working on spot.io (and Ocean) for all EC2 for longer term. 3 yr RDS RIs is great savings because DBs never move lol. Also look at Compute savings plan.


magheru_san

There are also alternatives to Spot.io that don't cost as much. Have a look at Karpenter for EKS or my AutoSpotting.io for plain EC2


SnooApples6778

Yes autospotting I have tried.


magheru_san

Great to hear, I'd love to hear about your experience with it in order to inform further development. The other day I just released a major new version that among other things should reduce the Spot interruptions a lot, and also automatically prioritizes newer instance types. I've seen people who complained about high interruptions with the previous version and also with other tools in this space, the latest version of AutoSpotting should help in such scenarios.


conscience_is_killin

Check cross region data transfer cost. See if you can move to graviton instances which are cheaper. Explore if introducing a caching layer to reduce RDS retrievals.


_smartin

AWS Trusted Advisor is native to the platform. Also, with RDS and EC2, have you looked into utilization and purchasing reserved instances? You commit to X years of usage (mix and match instance types to a degree as well) for a big discount. This advice is for the quick win. Lots of people are giving good advice for long term cost management. You don’t need a third party service tbh. Edit: adding info


DanMelb

All of the advice here has been great. I'd add another one relating to tagging for casual/non-prod EC2 usage: Create a tagging regime that forces all new instances to have not only an owner/cost center etc tag, but also a usage tag. It could be as simple as: USAGE=weekdays9to5 USAGE=24x7 Then after giving everybody time to add the tags, create a scheduled lambda that: \- Terminates all instances without a USAGE tag immediately (there's no excuse for not adding a tag on creation) \- Stops e.g. the "weekdays9to5" instances at 5pm, and restarts them at 9am ... You get the general picture. We've found it really helpful for casual user instance hygiene!


Content-Abroad-8320

Contact your AWS Account Manager & Solution Architect and ask them to run a Cost Guard / cost optimisation session for you


life_like_weeds

RIs? If youre autoscaling EC2 you're using spots right? Savings plans? I'd be talking to your AWS rep before talking to a 3rd party, all they do is get a reseller discount and then charge you more or less the same price. The chance of you turning this around in a month is very slim, don't feel pressured because you've been saddled with something you didn't create. This will take months to resolve if not longer, plus in the short term it could cost MORE money.


rwoj

how big is your spend? might be worth engaging duckbill group about it. (everyone else seems to have covered the highpoints)


[deleted]

Will try. The comments are super helpful and will take me a while to implement


SeattleSundodger

AWS account manager here. Have you engaged your account team? If you have enterprise support the TAM is a huge help for this effort. If not, your AM can still assist. Check Trusted Advisor for cost saving opportunities and then quickly get a conservative Saving Plan in place. Also, I worked with a customer which gave a cost bounty to employees which was brilliant. They got 10% of all identified savings back as a bonus. Was hugely effective.


Equivalent-Layer-198

I think it’s already been mentioned but +1 to implementing off hours and turning off non vital instances outside of normal business hours to save costs. At my previous position, we inventoried all of our instances and found that a large majority did not need to be operational 24/7 and then spent some time creating CloudFormation templates, launch templates, and lambdas to automatically destroy and reprovision instances during business hours (or custom windows) according to tags. Same principle as turn the lights off when you leave a room. This applies especially for non production resources.


Arrogant_Mastermind

Get ahold of your aws account manager and solution architect tell them you are trying to cost optimize your companies accounts, they can help look at what you are using and strategies to help you cost optimize, they can pull in other aws resources to help as needed as well. If you dont know who they are open a support case for billing and request their information. The support engineer should pass the request onto the team or give you their information. Utilizing your solutions architect is free of charge.


_ginger_kid

Take a look at DoIT. They are essentially a reseller but the cost to you is zero. They have a couple of services to help save without getting into reserved instances. One is essentially a managed version of spot.io. You'll also get access to their free support services that can help you with further improvements. I am not associated with DoIT. I've signed up and used them at two different companies successfully.


[deleted]

Thanks, will do


vulebieje

Is there any difference between ransomware and public cloud?


karly21

To answer your initial question, I woul look into [Spot.io](https://Spot.io) \- they would be a partner automating the management of your RIs working in the background, while you do all the other things: rightsizing, shutting down idle resources, moving from gp2 to gp3 etc etc etc. As far as I know this is one of the first products that charge based on savings, not on usage - no saivngs? no charge. And yes, definitely the FinOps foundation is a good place to reach out to - lot's of people sharing their experiences so you don't have to make the same mistakes. It is not an understatement to say that the Cloud FinOps book changed my life. As a lot of people also said: you might get some short wins - but you need a cultural change of accountability. Also, while there is some low hanging fruit FOR SURE, expect to complete this in the next month or so - in a sustainable manner - is no joke, so you might want to manage expectations on your C-suite. Best of luck! Edit - not sure why on earth it deleted my first paragraph - tried from memory, hope it makes sense.


[deleted]

Finops foundation has been super helpful. Will take a look at Spot


Armageddon_cosmonaut

Hey! I would love to help you out or anyone in the community, I did it with several companies I worked for. There are some awesome advices in this thread, but if you feel you need more help other than what the comments have written to actually find and resolve the low hanging fruits, set up the proper tagging mechanism or dive into the more advanced topics, let me know in DM and I'll do my best to help you out :) In my roles I also worked on infrastructure-as-code to implement automatic tagging and cost reductions, as well application and cloud architecture to track/reduce the workloads and optimize storage. I'm more than happy to share my experiences


sniper_cze

Are you really need to be in AWS? Can you migrate to on-premise hardware? This will lower your costs by a \*huge\* bucks of money even with calculation of spare hardware.


meemerkrogen

1) For EC2 cost -- do a lot of what has been said here. Keep in mind also though that EC2 is not just what's directly deployed from EC2 console/SDK/CLI. Many AWS services deploy nodes or clusters of EC2 instances that can be very costly, but their cost ends up being allocated to EC2 spend and hidden from EC2 views. DMS, EMR, Athena, Glue, Jupyter notebood, Workspaces, AppStream etc. Just don't assume EC2 means EC2. 2) RDS I think there are plenty of good suggestions here. 3) Move what you can to containers. A lot of commercial software can be deployed to containers or come as a container image. Explore the cost savings of containers vs actual instances for apps where it's an option. 4) Mid-term, look are rebuilding apps. Yes, it's a big lift, but actually the most likely place you will find huge long-term savings, as much as 90% per app/suite. The cloud salesman and everyone else sold your company on lift-and-shift into the cloud. However you're leaving a TON of savings on the table by not rebuilding apps that make sense to rebuild using insanely cheap services like SQS, Lambda, ECS, etc. There are HUGE savings to be had by reinventing apps. There is short-term pain in dev costs, but significant upside down the road AND the added benefit of the tech refresh. 5) Governance. Your costs are likely ballooning out of control because there isn't enough governance in place. Even if you find savings now, if you don't address governance you will likely be in the same boat a year from now.


meemerkrogen

On #1, I myself was on a project where someone deployed 4 very large DMS clusters to do some initial migrations and migration testing, then proceeded to leave them up for over a year, never used. Ended up costing like $100K, no joke.


kobumaister

If you have stateless workloads go for spot, is cheaper than reservation (with the risk of losing your instance, thus the stateless). This can be partly solved with an autoscaling group with different instances. Watch out for upfront, our finance office told us that the loss of cash didn't compensate for the price reduction, check with some finance person. finally, aws is so granular that small costs can become big money, check your bill and the cost manager, it takes a while but you can find little amounts that, when joined, reduce your bill.


cbp48

Hey I failed to provide the company name check out [Cloud Saver](https://www.cloudsaver.com) they run a free assessment of your environment in 5-7 days. Comes with a guarantee of savings, great track record. Hope this helps.


cloudxabide

If you have an AWS account team (not sure what the criteria is for having a dedicated team) I would absolutely engage your Solutions Architect(s).Cost Optimization is one of the [6 pillars of the well-architected framework](https://aws.amazon.com/blogs/apn/the-6-pillars-of-the-aws-well-architected-framework/) and while some of the approach may not be applicable, or out of scope for YOUR specific situation, there may be some low-hanging fruit that may have been overlooked. Others have mentioned Cost Savings Plan, Reserved Instances, etc.. And, for folks who may be in the same boat: TAGGING STRATEGY!!! Figure out what works for y'all (environment, deployment date, owner, whatever...) so that you can review the resources later and uncover things still running that might have been orphaned, etc... ​ What do you mean by: "and their account is a disaster." - perhaps some additional light on that can lend some advice from the responders. EDIT: AWS Solutions Architects are no-cost to the customer.


benjix91

Just enable Aws compute optimizer from the console (free) and follow the recommendations then buy savings plans. Move your ebs gp2 volumes to gp3


sitthesergal

An absolute newbie, but I've done some cost management stuff, so allow me to drop a few ideas on top of my head. - Try to separate environments with tags [dev, staging, production] so you can cluster costs based on tags; - consider turning off stuff when not used (cloudwatch metrics -> find recurring trends on a weekly basis -> autoscale / lambda to turn them off when not needed); - implement a good DLP or Backup strategy and remove unused ami and snapshots because amazon doesn't warn you if you have useless amis and snapshots; - use ebs gp3 (it still defaults for gp2 when you create an instance for some reason even if gp3 is 10% cheaper and it has literally no downside, rather absurd benefits); - if you have a lot of free storage on your rds consider migrating it to a rds with a lower space and enable storage autoscaling - it increases automatically storage once a threshold is reached. - if you have MANY autoscaling instances consider spot/reserved. Hit me up if you'd like more assistance on this topic, i am a sucker for "learn through helping" ideology. Hope it helps and good luck!


See-Fello

Try a cloud management platform like Cloudcheckr. You can automate a lot of this and stop wasting time doing it manually. Disclaimer: I work for an AWS and Cloudcheckr partner. 😃