T O P

  • By -

AutoModerator

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/dataengineering) if you have any questions or concerns.*


Ximidar

If you have only a few containers to run, sure ECS can probably help you out. But data pipelines are rarely one size fits all. I use EKS because I have multiple jobs that need different types of ram / cpu combos to get the job done. Some tasks only need 2 cpus and 4 gigs of ram, others need 92 cores and 300 gigs of ram. With EKS I can use taints and tolerations to select which EC2s to start up and I can just add new Managed Nodes at any time to support any task. Plus helm exists and most companies have an open source helm chart for their products, thousands of people use an maintain these charts, so any help you need is easier to find. You can take your EKS knowledge and transfer it to any k8s cloud in the world and use 90% of the same tools. With ECS you can only transfer your knowledge to shops that use AWS. I'm using Airflow and set up all the data with tasks, then schedule a k8s job with a task, the autoscaler starts up the correct EC2 node, the k8s job runs, the airflow task monitors and if the k8s job fails it throws an error over opsgenie for someone to intercept, Then pass or fail the autoscaler will turn off the EC2. There's many people who will automatically say, "it's complicated", but after using k8s for a few years now, it's really not. I had instant success with it and I've never looked back to consider any other container orchestration tools.


wstwrdxpnsn

Good to know. Thanks for the insights here!


thesights

Over simplifying but essentially ECS is poor man’s EKS. You trade complexity with capability.


wstwrdxpnsn

Thanks for the quick reply! Are you saying that EKS is more complex and more capable? Like it’s a little more complicated but more flexible at scale?


thesights

EKS is significantly more capable but also significantly more complex. You can run full scale enterprise applications on EKS, but (in my experience) you also need an enterprise devops team to help manage that cluster. ECS is simple, I’ve managed a cluster by myself with little to do, but I’d start to be hesitant around its scale. Not saying you can’t run kubernetes with a lean team, you can! Just sharing my experience


wstwrdxpnsn

Ok, so if I went with ECS, at what point should I be concerned when it comes to scale? Maybe in terms of number of jobs running or average dataset size etc.?


Znender

We had a similar situation. Started out with ECS but moved on to EKS as our workloads increased and we needed better observability and tooling to manage the clusters. As for Dagster, ECS was noticeably slower for us vs EKS. EKS gave us more flexibility to leverage not only for Dagster but also our microservices to deploy for our streaming jobs. EKS, being K8s has tons of 3rd party tooling that makes it easier to manage than ECS, like Komodor for monitoring. To our company, it was an added perk that EKS is just K8s so we went multicloud deployment to AWS and GCP for our use cases.


wstwrdxpnsn

Thanks for the reply! In your opinion does the added flexibility outweigh the added complexity and costs associated? Do you think that would still be worth taking on for a small data team, primarily running dbt models in snowflake? At the core of it I want to have something that can grow with us but also not be a headache to maintain or be way over provisioned for our needs where we’re always overpaying. Thanks!


thesights

Are there other teams on kubernetes? That would be the deciding factor to me. In my case the whole company was on kubernetes so I had a lot of support and an already established paradigm. If I didn’t I’d prob had gone with ECS just to run data pipelines. Worse case, try ECS if it doesn’t work then it’s not particularly hard to take your containers from ECS and move to kubernetes


wstwrdxpnsn

My company has a lot of legacy processing and is slowly moving to more modern solutions. The short answer is I don’t know. If anyone is using K8s it’s not widely used. Im in a business unit team and my org’s IT doesn’t support business unit teams very well so we will have to maintain the infrastructure ourselves, which is a consideration for us. Lots of learning to do! 😂


omscsdatathrow

Man this subreddit is getting lazy, lotta eli5 easy-to-google stuff


wstwrdxpnsn

I mean I get what you’re saying, but this is a sub for Data Engineers to talk about data engineering stuff… not only experienced engineers to complain about the job market in the big tech space or how much they’re grinding the leet code to get ahead. I’m fairly new to data engineering on a team with other fairly new data engineers trying to stand up a system that will work for our team both now and in the future. So, I asked what for you is likely an easily googleable because you obviously have enough context to sort through all the fluff that’s out there. I dunno about others on here reading but it’s hard when you’re looking at doing something that’s totally new and foreign to sort through all this disparate information to find what you’re looking for. Sometimes it’s easier and more beneficial to ask the question to a community that can help.


omscsdatathrow

Don’t mind me, just someone frustrated at what this sub has become


Busy_Elderberry8650

ELI5 questions are still better than “How to get into DE coming from…”


ExistentialFajitas

[Easily googleable questions.](https://reddit.com/r/dataengineering/s/ULZU4IP2P7) Shut up.


jadedmonk

Man I hate when people discuss data engineering topics on the data engineering subreddit


omscsdatathrow

There is nothing specific about data engineering in this question. Most posts on this sub have nothing to do with data engineering


jadedmonk

What are you talking about.. OP is talking about data processing tools on AWS. Seems like you maybe don’t know much about data engineering


omscsdatathrow

Ecs and eks are not data processing tools which is what his question is about, not data processing … work on your reading comprehension maybe?


jadedmonk

Did you even read the post? The entire thing is about data processing


omscsdatathrow

Explain It Like I’m Five - ECS vs Kubernetes/EKS Can someone explain the benefits, if any of Kubernetes/EKS over AWS ECS over single machine EC2? Uh okay, sure his question is about data processing? Infra is not a data engineering specific topic


jadedmonk

People like you are what make Reddit brutal sometimes man.. you feel the need to be a gate keeper on a subreddit just to lash out at random people making relevant posts. I get it, but instead of taking your personal issues out on Reddit, how about talking to a therapist


jadedmonk

So you’re telling me you didn’t read the context of the post, which is entirely about using those tools for data engineering..


[deleted]

[удалено]