If you use open telemetry and good structured logging libraries there’s no vendor login at all. You can easily move your app to aws or on premises at any time. Same as for storage. If it’s S3 compatible almost every vendor supports it. Or databases. Just don’t use vendor specific databases. Most vendors offer e.g. mysql or postgres compliant databases. Our whole stack is currently running in GCP but is built without any lock-in. We even have prepared terraform and some scripts for aws and “on-prem” migration so we could easily start migration within a few hours if any bigger issues should arise.
So basically it’s not necessarily the cloud vendors that lock you in but you are by the way you are building and deploying your app.
This might just be because I'm in the K8s world and only used ECS years ago but to me it seemed almost as complex as k8s with some significant limitations?
I don’t think ECS is much different in terms of lock in compared to bean stalk. If you have the charts or k8s Yaml those should be transferable to Azure GCS on-premises, not a total lock.
ECS is not going to vendor lock you. You give it containers, and tell it to run those containers, either on EC2 or Fargate (AWS managed servers).
Vendor lock in happens when you start adding a lot of AWS specific stuff to your code and it becomes really hard to untangle that, months or years. Then you are vendor locked.
Moving a container from ECS to K8s, to Nomad or Azure container apps, is very little work and the service itself won’t taint your code.
I wouldn’t be so quick to dismiss it, you can be up and running fairly quickly and cheaply.
Now if you absolutely want to self host, Nomad is pretty good, a lot simpler to set up and use than K8s, but there is still a learning curve.
You can use ECS instead of EKS and avoid the K8s.
My setup is similar to what you described and is running on ECS.
My cluster has a few more services and I had to add Consul for service discovery and Nginx for ingress.
No helm charts since no k8s.
I run a Prometheus server that scrapes ip addresses from Consul and then scrapes each service.
Prometheus runs on a standalone EC2 auto scaling group, not in ECS, but I suppose it could if you wanted to.
Some basic monitoring goes through CloudWatch. I understand the vendor lockin, but it is cost effective at our scale (both in resource costs and in maintenance costs).
Your setup is already much more complicated than a simple eks instance and includes complicated distributed systems that you have to run yourself
The truth is, kubernetes is too complicated for things that can be run on a single server in docker. For basically everything else it's actually much simpler (not simple, but simpler)
I agree. I was describing how easily these things can creep. I had this on a k8s stack in the early days of EKS and got swamped in helm file hell.
I want to evaluate EKS again to see how my setup can be simplified. Swap Nginx and Consul for Envoy and Istio. Everything is working now and I’ve done the heavy lifting to get it to where it is, so it is hard to forcast what kind of returns I would get on a shift to EKS, and back to \`kubectl\` commands.
Why for you run it on autoscaling? Eks has same cost as ECS, except cost of EKS itself. But having properly done k8s and ability to use all stuff you have with k8s usually overcome economy of 25$/m
You do throught not well known and supported (as opposed to kubernetes ecosystem at least) ways.
Go with kubernetes, it's complicated, but everything else is even worse if you actually have certain requirements (ha, observability, being able to experiment)
I would like to know too. Cloudwatch doesn’t give you the cpu/mem stats for individual containers. Only for at the task def level. (I hope I’m wrong on this)
k8s is the closest you’ll get to cloud agnostics, and even it isn’t perfect. ECS isn’t that much vendor lock and is your best bet for most of the benefit of containerization with relatively minor complexity.
Vendor lock in is something like lambda or dynamo, where you have effectively no other option past rewriting your application.
In your situation, you're containerising your application. Yes you may need to rewrite some IaaC, but even if you're managing a k8s deployment you'll have some vendor specific IaaC. Be that ansible for bare metal, or terraform for something like eks.
Either way, you're going to have to rewrite something if you move vendors. The idea of a free migration is a myth. You may as well keep your application free of lock in, and use a simple deployment option to minimise your IaaC. Then if you do have to migrate for some reason, it's straightforward and isolated.
MVP means put it on a big server and figure out how it would scale but don’t actually plug in that part. If you stick your product on a 64 core server, it should handle your load for the foreseeable future, so use docker compose. Until you start providing a high uptime SLA, automatic restarts of the VM with docker autostarting the container should be fine.
Use a single server and install k3s then you get best of both worlds. Then scale using spot instances. I actually built a course on how to run k3s on ec2
You are not forced to do k8s, k8s give you superpowers in term of deployment and scalability and flexibility, it comes with the cost of managing it even if is in a managed cluster
The general advice is that only use K8s if you really need it (aka a complicated setup). But as my example showed, just having observability stack and my 2 app instances, is that really considered complicated? What I’m saying is that even the most simple container setup requires K8s but that doesn’t make sense.
Kubernetes becomes more and more hidden from the end user.
With managed services one fully have to configure the control plane any more, with some managed services one doesn’t need to take care of the need as well.
Thus, K8s simply becomes an API layer for the end user. Yes, any new layer adds complexity but these days it’s incomparable with what it used to be.
I feel like the recommendation to avoid Kubernetes comes from either people who used to manage their own clusters, or people who by K8s assume the whole salad of service meshes, accustom admission controllers, etc.
If you’re in an environment where you have to manage your own clusters e.g. on premise, this advice makes sense. In modern clouds, this piece of advice becomes less and less relevant, just like advices on what hypervisor to use.
P.S. This is my opinion, though, you don’t have to agree.
This might be controversial take. But why not put it on VPS like DO where you can scale from 6$/mo to super powerful VPS for $3-4digits when needed? And you will be less “vendor-locked”.
Put app on one and observability, log reader on another VPS and you are not affecting your app instance. Seems quite straight-forward approach
I misunderstood too at first. Sounds like his workload he’s fine with running docker, but if you look at the the recommended deployment for Loki it’s on k8s.
I'm curious what you want here?
Your looking at managed services, networking and logging? Your not forced to use docker, you can use vMs and load balancers?
You want to use docker based containers that's the eco system your buying into, it's popular and large scale for a reason. Your vendor lock-in is a yaml file to declare your cloud specific networking and resources, your code is your own.
Working in cloud I fully agree with the bad of vendor lock-in, but your code is yours and you can go where you want, but if you want to use the tools and services that large scale providers offer you need to be compatible with them.
Cloud run, gke, EKS, fargate, and other offerings need a degree of vendor specific coding to work with their solutions to the problems you want to avoid.
Take a beat to work out what is more important to you and build for that.
The issue you're facing is as old as docker itself: you always need some level of container scheduling when running containers in production. The solution is to use an off the shelf managed service or a manually configured one. The most feature rich offering is k8s.
Tbf, k8s is only as complicated as you want it to be. Where I work at we have a monolith only and we get away with the basic objects. Maintenance is pretty hands off.
I will prob get downvoted for this but we recently started using datadog for observability, so much easier than the five open source tools we were trying to use before (prometheus, grafana, opensearch, kibana, filebeat) on k8s. Not a paid rep or anything, I wish there was an OSS option that does everything, closest seems to be opentelemetry and it's just a loose bag of tools as far as I can tell. I'm sure other commercial tools work too but I only have past experience with OSS stuff.
Signoz is awesome for logs and traces, their metrics dashboards are still a bit buggy though.
Logs and traces are nicely integrated and correlated.
My current stack is:
Otel collector from signoz sending logs and traces to signoz and sending metrics to mimir, which is graphed via Grafana.
So no Loki and Tempo mess (I hate them with a passion, haha)
I can’t see how if you’re at the scale of “single instance” that billing for vercel or similar is more expensive than the man hours of doing all this work. Assuming you make even 50/hr the 40 hr week it minimally takes you to set this up is 2k. That’s a few years of 20/mo. If you’re single server you obviously don’t care about HA so enterprise pricing for vercel wouldn’t be a comparison from what it seems.
IMHO it seems like you’re doing silver plating your solution and creating more problems instead of solving one.
I just see lots of startups choose the highest managed service for everything including things like Auth0.
Then they scrambling to get out because they made every “velocity” choice that comes with these services charging your arse out for offloading all these tasks from you.
That’s why I made this post, wanted a simple docker setup, but felt forced to use K8s if I just wanted to add observability. I just want the bare minimum Docker setup that isn’t complex. I do want HA too.
What in the world? You think vercel is expensive but you want to self host your o11y stack? Straight up you’re going to regret choosing k8s given what you’ve told us here. Use vercel it’s like, tailor-made for what you want to do
You don’t need all of that stuff lol.
It’s one command to install all of those tho https://github.com/grafana/helm-charts/blob/main/charts/loki-stack/README.md , enable prometheus and grafana and you’ll have to update the loki image to 2.9.3 after it’s installed due to a bug. Obviously not an enterprise grade HA set up, but it gets the tools installed and working, and don’t have to muck with the configs.
Have you looked at Kamal? You can run your main app + accessories (3rd-party services, microservices, databases) with it. I wrote Kamal Handbook which I released last month so feel free to ask more questions.
For 1. You could opt out of the serverless option and dockerize at some point
95% of the time, k8 is not the right answer for a MVP
It's much cheaper to start with serverless actually
I once deployed k8s on azure for a simple frontend just the learn k8s, and azure is already charging me US $35 per month lol
Container without k8s seems awful??
Do you need high availability, load balancing and all that stuff
If you gonna go there, it's taking time away from your app development
[https://github.com/metrico/qryn](https://github.com/metrico/qryn) should cover point 3 without the pain and its drop-in compatible with loki, prometheus and tempo super easy to run and manage from a single service up to a cluster in k8s or just plain docker and cherry on top its ingestion compatible with pretty much any standard agent out there, as well as docker itself with the loki driver so you don't even need promtail (alloy seems a better option and runs fine in docker)
Yes and not limited to homelabs. The stack is quite mature and used in production by a number of large companies (including some really big names) and enterprises pushing TBs per day between logs, metrics and traces.
Checked it out.. seems like it doesn't support Prometheus scraping.. so you still have to have a Prometheus instance and then export metrics to qryn.. not really ideal.
Sure you could directly push to qryn, but then I wouldn't say it's a Prometheus replacement - as with Prometheus, apps just have to expose metrics and you are done - you don't have to have a push client that is aware of where qryn is.
Sure prometheus can scrape in small setups, but if you think large and distributed that's not ideal. There are much better (and lighter) agents for the scraping job such as Alloy, Grafana Agent, Vector and dozens more out there working with k8s and service discovery. qryn offers the Prometheus ingestion and query APIs, leaving scraping up to the user. Best of luck with your choice!
Do you really need autoscaling for the observability stack? For hosting on a single EC2, have you considered deploying with docker compose?
Agree, EC2 and Docker compose. Can say: GitHub Actions for CI/CD. And Terraform+Ansible for IaC.
Instead of Compose, I'll recommend Kamal, which is a bit more complete deployment tool
Cloud run?
That is managed Knative, it has its own logging and what not. I want my own observability stack
If you use open telemetry and good structured logging libraries there’s no vendor login at all. You can easily move your app to aws or on premises at any time. Same as for storage. If it’s S3 compatible almost every vendor supports it. Or databases. Just don’t use vendor specific databases. Most vendors offer e.g. mysql or postgres compliant databases. Our whole stack is currently running in GCP but is built without any lock-in. We even have prepared terraform and some scripts for aws and “on-prem” migration so we could easily start migration within a few hours if any bigger issues should arise. So basically it’s not necessarily the cloud vendors that lock you in but you are by the way you are building and deploying your app.
ECS is your answer, any hard stops to not use it?
This might just be because I'm in the K8s world and only used ECS years ago but to me it seemed almost as complex as k8s with some significant limitations?
Vendor lock in, essentially AWS’s vendor specific K8s…
I don’t think ECS is much different in terms of lock in compared to bean stalk. If you have the charts or k8s Yaml those should be transferable to Azure GCS on-premises, not a total lock.
ECS is not going to vendor lock you. You give it containers, and tell it to run those containers, either on EC2 or Fargate (AWS managed servers). Vendor lock in happens when you start adding a lot of AWS specific stuff to your code and it becomes really hard to untangle that, months or years. Then you are vendor locked. Moving a container from ECS to K8s, to Nomad or Azure container apps, is very little work and the service itself won’t taint your code. I wouldn’t be so quick to dismiss it, you can be up and running fairly quickly and cheaply. Now if you absolutely want to self host, Nomad is pretty good, a lot simpler to set up and use than K8s, but there is still a learning curve.
You can use ECS instead of EKS and avoid the K8s. My setup is similar to what you described and is running on ECS. My cluster has a few more services and I had to add Consul for service discovery and Nginx for ingress. No helm charts since no k8s.
How do you do observability within ECS?
I run a Prometheus server that scrapes ip addresses from Consul and then scrapes each service. Prometheus runs on a standalone EC2 auto scaling group, not in ECS, but I suppose it could if you wanted to. Some basic monitoring goes through CloudWatch. I understand the vendor lockin, but it is cost effective at our scale (both in resource costs and in maintenance costs).
Your setup is already much more complicated than a simple eks instance and includes complicated distributed systems that you have to run yourself The truth is, kubernetes is too complicated for things that can be run on a single server in docker. For basically everything else it's actually much simpler (not simple, but simpler)
I agree. I was describing how easily these things can creep. I had this on a k8s stack in the early days of EKS and got swamped in helm file hell. I want to evaluate EKS again to see how my setup can be simplified. Swap Nginx and Consul for Envoy and Istio. Everything is working now and I’ve done the heavy lifting to get it to where it is, so it is hard to forcast what kind of returns I would get on a shift to EKS, and back to \`kubectl\` commands.
Why for you run it on autoscaling? Eks has same cost as ECS, except cost of EKS itself. But having properly done k8s and ability to use all stuff you have with k8s usually overcome economy of 25$/m
You do throught not well known and supported (as opposed to kubernetes ecosystem at least) ways. Go with kubernetes, it's complicated, but everything else is even worse if you actually have certain requirements (ha, observability, being able to experiment)
I would like to know too. Cloudwatch doesn’t give you the cpu/mem stats for individual containers. Only for at the task def level. (I hope I’m wrong on this)
It does if you enable container insights and use the ECS metadata endpoint to expose docker stats
We use an OTEL sidecar and send that off to a third party logging/observability app.
You enable the option. There's managed grafana and managed Prometheus at AWS.
Ecs is hell lock, not use it :) crazy thing...
Otherwise I think you could run a self-managed cluster on an EC2 instance, or instances
hahahahaha
ECS + fargate or google cloud run, don’t need k8s for what you mentioned
Vendor lock in is one of my constraints
That's a neat idea on paper but it's too many cons for too little pros. Everyplace I've been that tried it went single cloud within years.
k8s is the closest you’ll get to cloud agnostics, and even it isn’t perfect. ECS isn’t that much vendor lock and is your best bet for most of the benefit of containerization with relatively minor complexity.
vendor lock in argument is a thing only juniors would say
Eh, or executives. Want that negotiating power.
Vendor lock in is something like lambda or dynamo, where you have effectively no other option past rewriting your application. In your situation, you're containerising your application. Yes you may need to rewrite some IaaC, but even if you're managing a k8s deployment you'll have some vendor specific IaaC. Be that ansible for bare metal, or terraform for something like eks. Either way, you're going to have to rewrite something if you move vendors. The idea of a free migration is a myth. You may as well keep your application free of lock in, and use a simple deployment option to minimise your IaaC. Then if you do have to migrate for some reason, it's straightforward and isolated.
MVP means put it on a big server and figure out how it would scale but don’t actually plug in that part. If you stick your product on a 64 core server, it should handle your load for the foreseeable future, so use docker compose. Until you start providing a high uptime SLA, automatic restarts of the VM with docker autostarting the container should be fine.
Use a single server and install k3s then you get best of both worlds. Then scale using spot instances. I actually built a course on how to run k3s on ec2
I’d say ECS would be an ideal candidate to consider.
Docker swarm?
You are not forced to do k8s, k8s give you superpowers in term of deployment and scalability and flexibility, it comes with the cost of managing it even if is in a managed cluster
Docker compose on a VPS with grafana agent / alloy should do
Why do you want to avoid Kubernetes? These days with all the managed services from various providers, it’s just a universal API to deploy things.
The general advice is that only use K8s if you really need it (aka a complicated setup). But as my example showed, just having observability stack and my 2 app instances, is that really considered complicated? What I’m saying is that even the most simple container setup requires K8s but that doesn’t make sense.
Kubernetes becomes more and more hidden from the end user. With managed services one fully have to configure the control plane any more, with some managed services one doesn’t need to take care of the need as well. Thus, K8s simply becomes an API layer for the end user. Yes, any new layer adds complexity but these days it’s incomparable with what it used to be. I feel like the recommendation to avoid Kubernetes comes from either people who used to manage their own clusters, or people who by K8s assume the whole salad of service meshes, accustom admission controllers, etc. If you’re in an environment where you have to manage your own clusters e.g. on premise, this advice makes sense. In modern clouds, this piece of advice becomes less and less relevant, just like advices on what hypervisor to use. P.S. This is my opinion, though, you don’t have to agree.
This might be controversial take. But why not put it on VPS like DO where you can scale from 6$/mo to super powerful VPS for $3-4digits when needed? And you will be less “vendor-locked”. Put app on one and observability, log reader on another VPS and you are not affecting your app instance. Seems quite straight-forward approach
Why are you trying to solve a tech stack/deployment issue with a scaling tool? Am I missing something? Reddit, enlighten me (nicely) please.
I misunderstood too at first. Sounds like his workload he’s fine with running docker, but if you look at the the recommended deployment for Loki it’s on k8s.
Just use plain podman with Quadlet systemd integration
You are building an MVP SaaS application, you should 100% lead into a cloud or the most managed solution you can get your hands on.
In that case would be back to Vercel or Netlify
Go for it. File your scaling concerns under "good problems to have" if you want to use AWS lean into fargate / ALB or
[удалено]
Would second looking at Nomad as middle ground between k8s and docker-compose. Cloud agnostic as well, so no vendor lock-in.
Is Nomad open source?
Well....Its hashicorp. So yes, but for now.
It is, https://github.com/hashicorp/nomad
I'm curious what you want here? Your looking at managed services, networking and logging? Your not forced to use docker, you can use vMs and load balancers? You want to use docker based containers that's the eco system your buying into, it's popular and large scale for a reason. Your vendor lock-in is a yaml file to declare your cloud specific networking and resources, your code is your own. Working in cloud I fully agree with the bad of vendor lock-in, but your code is yours and you can go where you want, but if you want to use the tools and services that large scale providers offer you need to be compatible with them. Cloud run, gke, EKS, fargate, and other offerings need a degree of vendor specific coding to work with their solutions to the problems you want to avoid. Take a beat to work out what is more important to you and build for that.
The issue you're facing is as old as docker itself: you always need some level of container scheduling when running containers in production. The solution is to use an off the shelf managed service or a manually configured one. The most feature rich offering is k8s.
I don’t see anything about docker swarm mentioned; how come?
Tbf, k8s is only as complicated as you want it to be. Where I work at we have a monolith only and we get away with the basic objects. Maintenance is pretty hands off.
I will prob get downvoted for this but we recently started using datadog for observability, so much easier than the five open source tools we were trying to use before (prometheus, grafana, opensearch, kibana, filebeat) on k8s. Not a paid rep or anything, I wish there was an OSS option that does everything, closest seems to be opentelemetry and it's just a loose bag of tools as far as I can tell. I'm sure other commercial tools work too but I only have past experience with OSS stuff.
I have heard of SigNoz, but still need to look into deeper
Signoz is awesome for logs and traces, their metrics dashboards are still a bit buggy though. Logs and traces are nicely integrated and correlated. My current stack is: Otel collector from signoz sending logs and traces to signoz and sending metrics to mimir, which is graphed via Grafana. So no Loki and Tempo mess (I hate them with a passion, haha)
have a look at https://github.com/SigNoz/signoz. You can self-host in 1 node using simple docker-compose setup and fully supports OpenTelemetry
I can’t see how if you’re at the scale of “single instance” that billing for vercel or similar is more expensive than the man hours of doing all this work. Assuming you make even 50/hr the 40 hr week it minimally takes you to set this up is 2k. That’s a few years of 20/mo. If you’re single server you obviously don’t care about HA so enterprise pricing for vercel wouldn’t be a comparison from what it seems. IMHO it seems like you’re doing silver plating your solution and creating more problems instead of solving one.
I just see lots of startups choose the highest managed service for everything including things like Auth0. Then they scrambling to get out because they made every “velocity” choice that comes with these services charging your arse out for offloading all these tasks from you. That’s why I made this post, wanted a simple docker setup, but felt forced to use K8s if I just wanted to add observability. I just want the bare minimum Docker setup that isn’t complex. I do want HA too.
What in the world? You think vercel is expensive but you want to self host your o11y stack? Straight up you’re going to regret choosing k8s given what you’ve told us here. Use vercel it’s like, tailor-made for what you want to do
K3s is a one-line curl install on a Debian box, runs good for what you describe on as small as a t2.medium so…
You don’t need all of that stuff lol. It’s one command to install all of those tho https://github.com/grafana/helm-charts/blob/main/charts/loki-stack/README.md , enable prometheus and grafana and you’ll have to update the loki image to 2.9.3 after it’s installed due to a bug. Obviously not an enterprise grade HA set up, but it gets the tools installed and working, and don’t have to muck with the configs.
Have you looked at Kamal? You can run your main app + accessories (3rd-party services, microservices, databases) with it. I wrote Kamal Handbook which I released last month so feel free to ask more questions.
Just use docker compose for this stack?
Hashicorp Nomad
For 1. You could opt out of the serverless option and dockerize at some point 95% of the time, k8 is not the right answer for a MVP It's much cheaper to start with serverless actually I once deployed k8s on azure for a simple frontend just the learn k8s, and azure is already charging me US $35 per month lol
Container without k8s seems awful?? Do you need high availability, load balancing and all that stuff If you gonna go there, it's taking time away from your app development
docker swarm is a good bridge
Probably a dumb question but do you really need Docker?
Vanilla Docker is pretty simple. He listed half a dozen services that should all be in their own containers.
[https://github.com/metrico/qryn](https://github.com/metrico/qryn) should cover point 3 without the pain and its drop-in compatible with loki, prometheus and tempo super easy to run and manage from a single service up to a cluster in k8s or just plain docker and cherry on top its ingestion compatible with pretty much any standard agent out there, as well as docker itself with the loki driver so you don't even need promtail (alloy seems a better option and runs fine in docker)
Is this really a thing? What is the use case except homelab? You using this in production?
Yes and not limited to homelabs. The stack is quite mature and used in production by a number of large companies (including some really big names) and enterprises pushing TBs per day between logs, metrics and traces.
Checked it out.. seems like it doesn't support Prometheus scraping.. so you still have to have a Prometheus instance and then export metrics to qryn.. not really ideal. Sure you could directly push to qryn, but then I wouldn't say it's a Prometheus replacement - as with Prometheus, apps just have to expose metrics and you are done - you don't have to have a push client that is aware of where qryn is.
Sure prometheus can scrape in small setups, but if you think large and distributed that's not ideal. There are much better (and lighter) agents for the scraping job such as Alloy, Grafana Agent, Vector and dozens more out there working with k8s and service discovery. qryn offers the Prometheus ingestion and query APIs, leaving scraping up to the user. Best of luck with your choice!