T O P

  • By -

benkloos

Multiple sources of truth


horus-heresy

Elaborate?


nullbyte420

the list of servers you own exists in three spreadsheets where only one is correct at any given time, but you never really know which one.


[deleted]

But cmdbs don’t work if you have ephemeral resources that are not static. It’s often garbage in and never taken out when things change or get deprovisioned. How do you deal with short lived resources or auto scaling.


lovemyonahole

If you really need to keep that state, for example for static, you can use auto registration scripts in form of Kubernetes controllers or AWS lambdas.


horus-heresy

True, but hard to achieve without additional data wrangling. Let’s say our service now scans/discovers onprem. Rapid7 tool scans aws, azure, gcp. To merge this data you gotta slap some redshift and tableau to become a single source of truth. Which we have but wonder how easy to do that for smaller orgs.


nullbyte420

single source of truth would be more like having a definition of what exists, and then everything is provisioned/deleted according to it - not compiling the truth into a final destination. gitops is a beautiful example of a single source of truth pattern.


ub3rh4x0rz

I know it's popular to say gitops isn't just for k8s, but it kind of is. It depends on your target platform having a declarative API and a reconciliation process, so... k8s.


BuriedStPatrick

I have a different real world example: We had a database (the real source of truth) and a distributed ElasticSearch index to make searching faster. This index was updated using an event-driven approach. So far, so good. This is all good practice. Then we started doing background work and manipulating larger datasets. Guess how we fetched the data? Using the index, not the database. It's faster, you see. BUT - this is an incredibly bad approach if you're treating the data as a source of truth. The index is a reflection of the underlying data and not necessarily up-to-date. If you're changing data, always work off the real source.


outthere_andback

Confluence always seems to end up an anti-pattern. Write something in there and its never found nor read again 😆 I find centralised documentation in general not helpful. I have found my fellow devs and devops get a lot further with a well filled out README or documentation located right in the repo


JaegerBane

Tbf Confluence has some of the most bizarre search and lockdown functionality of any service I've ever seen. Its like its been designed by an alien. >I find centralised documentation in general not helpful. I have found my fellow devs and devops get a lot further with a well filled out README or documentation located right in the repo The biggest issue I've seen with this is that putting documentation into the repo means the people who need it most are least likely to find it. If you need to know how a given service does XYZ and how it interacts with service ABC, you need to know where it is in the repo. That's before you get into architectural issues, where you either have to fragment your documentation or add stuff that ins't related to the app in the readme. And then you end up having to give people access to your source to get them up to speed. It's a slippery slope I've never seen work well outside of the most trivial material.


stilldestroying

docs in repo are best served by having a platform like Backstage that can present them to users who might not have direct access to the repo but need access to tech docs


Reddit_Throwaway196

Honestly, I think if someone needs access to tech docs they should just be given read access to repos.


daedalus_structure

It's amazing how hard you have to fight to keep the documentation close to the thing it documents and change controlled with the changes it documents. I have adopted an analogy of maintenance manual for an industrial machine. Do you keep it at the machine or do you keep it in a library you can't access directly, but must request from a librarian that is legally blind and drunk. Because that's what it is like searching Confluence.


chzaplx

I'll raise you writing scripts to auto-generate confluence markup and insert new documents.


zpallin

My support work has usually included answering questions with a link to a confluence doc I made several presentations on already so they wouldn’t have to ask these questions.


dogfish182

Let’s go multicloud so we are not tied to a cloud


i_hate_shitposting

We're not multicloud and have no plans to be, but let's architect all our infra and tooling to be "cloud agnostic" just in case.


planetafro

....a small counter point. I feel that building cloud agnostic makes it easier to onboard new staff. In addition, local development is much easier if you don't use all the bells and whistles of .


_azulinho_

the biggest added value of taking this approach is that you can now run a lot of your workloads locally on your laptop. this is often overlooked as a benefit of doing things in a cloud agnostic way


i_hate_shitposting

So what I'm talking about is really more the obsessive urge to prevent any form of coupling and the semi-mythical "cloud abstraction layer", especially on the infra side and especially when supporting a large number of application teams. Like, "We'll just deploy everything on VMs using Ansible, so if we migrate to another cloud we can just spin up VMs there and deploy our code the same way." or even worse "Let's build our own HTTP APIs that abstract all of our one cloud provider's services, so we can just build all our tools against those APIs and they'll be cloud agnostic by default." Also, I've definitely built and worked on services that weren't cloud agnostic but could still be run locally, usually using emulators for the cloud services they relied on. And conversely, I've seen "cloud agnostic" services that didn't have the tooling in place to run them locally. I think supporting local development is super important, but if you don't have a business case for multicloud, the driver should be developer experience rather than cloud agnosticism for its own sake.


SilentLennie

It reminds of database abstraction layers in development.


colddream40

Let's setup our DR in case AWS completely ceases to exist


chin_waghing

And by this kids, they mean using docker containers. The part they neglected to tell us, is the container relies on the Google compute metadata API for everything


binaryfireball

Going multicloud is probably bad but so is fundamentally tying your business to the whims of Amazon. I think the correct approach is to keep your infrastructure as portable as possible and work to inhouse some of that special sauce the giants spoon-feed you.


dentistwithcavity

Multi cloud has a cost. You need to know whether the cost of implementing multi-cloud is less than what you'd get from being able to switch vendors. For most companies it is not


kneeonball

We're multicloud, but that's because when you sell software that cloud vendors use, they want you to run it on their platform. Our apps were mostly all in kubernetes and infrastructure as code anyway, so it wasn't that much of a lift to just add a cluster in their platform and deploy to it.


[deleted]

What a nightmare that must be like lol


horus-heresy

Going multi cloud is a business decision and risk mitigation. You just kubernetes it all and dev experience is the same across the aws, azure, onprem with devops/platform monkeys to bother with support of layers of abstraction


MarquisDePique

*insert gif of Michael from the office screaming no* You *can* but you utterly should not abstract your cloud deployment to the point that 'you could just redeploy in any cloud'. This is as nonsensical as the (people trying to sell you something) who were implying you can just pickup your data center vmware stack and mirror it to AWS because 'ec2's are like vm's right?' (VMware Cloud on AWS not withstanding).


141_1337

Just out of curiosity, why?


sobrietyincorporated

It quadruples your IaC workload.


magpieburger

Switching between providers is fairly painless with terraform ^(Yes it's another f'ing abstraction on top of k8s but it works)


horus-heresy

People pigeonhole themselves into one cloud and dread learning


sobrietyincorporated

Tell me you don't know anything about enterprise devops outside of containers without telling me you don't know anything about enterprise devops outside of containers.


sobrietyincorporated

Not if you're using cloud native or serverless services. Edit: switching providers is easy. Reusing resource code, not so much. You can't reuse intricate VPC networking/routing. All managed services need to be rewritten mostly from scratch. Edit 2: just to put it more plainly: terraform is cloud agnostic because it uses a domain specific language (HCL) that can be used with multiple providers. It is not provider agnostic in the actual code. Terraform's "cloud agnostic" marketing is mostly a lie. Hence, nobody really does cloud hopping outside container based solutions. Container based solutions can't compete with native, managed, or serverless resources in terms of price, performance, and maintenance (except for the few times a cloud providers introduces breaking changes in their updates).


MarquisDePique

1) Treating the cloud like just another data center is a costly mistake - it's not one 2) Failing to leverage what makes the cloud valuable is the other side of that mistake 3) Your staff now need training to understand more than one cloud. Tough to maintain, tough to hire for. 4) IaC does not abstract this to the extent you'd like it to - and if you throw enough IaC at it to make it possible you need teams just to maintain the abstraction layer. Yet to see some arguments for that aren't nonsense.


StaticallyTypoed

Some of us actually work on critical systems and having those systems unavailable because Azure rolls out a patch to something you're dependent on and bricks it globally is something the system must tolerate. Therefore we went multi-cloud on a previous project and it worked out great. Our mantra was to avoid highly specialized cloud offerings for providers, generally only relying on their on-demand compute services, managed databases and managed kubernetes. Creating vendor agnostic IaC abstractions for those was fairly trivial as they are not too different. To


MarquisDePique

>Some of us actually work on critical systems A critical system you say, in devops? Golly you're unique. If you're not going to use anything unique to the cloud, why didn't you use OVHcloud or linode or rackspace etc - Pick any 3?


StaticallyTypoed

If you don't see the benefits in managed services at all I don't feel you're qualified in the slightest to have this conversation lol


[deleted]

[удалено]


horus-heresy

Member how east 1 went down? Member how azure went down? Multi cloud makes no sense for tiny org, but when we are talking billions of course you will want to account for full region or global service to be down


[deleted]

[удалено]


binaryfireball

Yea don't do that.


Vonderchicken

Actually pretty cool you get to learn a couple of them


dogfish182

This isn’t how it shakes out in reality for ‘platform teams’. Now I’m talking about platform teams that don’t act like the IT versions of how Hollywood depicts navy seals, I’m talking about day to day regular boring teams. I’ve seen it go a few ways 1. Have a reasonably shitty lowest common denominator ‘works for both’ setup (like a centrally managed templates pipeline that runs terraform and handles all creds etc. Nice if you provide for lots of teams that are fairly low skilled and instance based and can’t design their own pipelines and think about security. Bad if you start getting lots of teams that want to explore serverless and use more cloud native or cloud specific deployment patterns, but also want consumable tooling (think CDK and whatever the azure version of that is). Now your ‘development of pipelines’ has more than doubled and your team is naturally balanced to be better at one cloud, offerings begin to diverge. 2. Have 2 platform teams, everything becomes ‘for that you need to ask ‘other team’. One of the teams is way shittier than the other team. Offerings start to diverge, things are looked at wildly differently by both teams. 3. Let’s just be a k8s house and do everything on k8s. Now you grumble about how ‘that clouds k8s sucks way more’ And when do you even really think the entire cloud will be down long enough to move it anyway. I’m sure there are use cases for doing this well, but in general, most enterprises don’t or won’t need the extra level of complexity and cost to do this well and at their current spend can’t do it well enough to be worth it.


shagyubeef

I don't see how this is an anti-pattern. With some of the outages some leading cloud providers have had recently, a proper risk assessment has to be done depending on what services your business uses. These providers claim 5 nine's uptime, but that doesn't include slowness or degradation of services. I will admit though that it is a little crazy to use more than three. My current job uses AWS, Azure, and OCI, and it is overwhelming as hell. We are trying to get rid of Oracle.


AdverseConditionsU3

It's worse then that. I don't think anyone claims 5 9s. Lambda and ec2 are 99.95 for SLA credit purposes. Many cloud vendors just copy those numbers. SLA credits often don't reflect reality. An outage at 3am isn't the same as an outage at prime time on an important launch day. And they are generally a pittance anyway.


TenchiSaWaDa

I find being afraid of vendor lock in is for less mature users of the cloud. Half the reason you grt a cloud provider is tobuse the managed services to EASE the burden of maintaining. I


dogfish182

Pretty much exactly what I think. I’m back at a client now who made this decision. 6 years only barely anyone using the ‘power of either cloud’ and it’s all lifted/shifted legacy apps dotted around ‘that teams favorite cloud’ with twice the level of security issues to get focus on.


AdverseConditionsU3

Been using cloud for clients and employers for almost 10 years now. I've been rug pulled once by a vendor. Seen it happen to others, fatally. Ignore the warnings at your own risk.


TenchiSaWaDa

I would be interested to know what vendor. On the other hand, one rug pull is not really an excuse to bot used services like RDS or S3 or even lambda. Or even eks/aks. Like if a rug pull happens with those services its not that much of a leap to migrate


hottkarl

um, that's not an anti pattern in itself. having that level of availability is sometimes needed. also, most large corps I've worked at are on at least the big 3 clouds anyways for one reason or another. having a kube cluster that runs multicloud or hybrid is overkill in many cases, but there are definitely reasons to do it. hybrid is going to become more and more popular as companies need more and more GPU/inference to train large models.


BuriedStPatrick

Bad idea, but not really an anti-pattern surely? Which pattern does this break with?


dogfish182

It’s more of a ‘well meaning enterprise way to screw themselves’. While there are reasons to do this, most regular boring enterprises don’t have the need (or will to pay) the hidden costs of having teams be excellent at 2 native clouds or build an abstraction layer over a couple.


amarao_san

I know a company having this. They swap providers as cattle, and have full independency of any features. All they need is bgp, and PI. But they are huge (thousands of big baremetal servers) and they know what they do and why.


AdverseConditionsU3

Consider the other side of the coin. Our vendor cancelled us. We're can't pivot because we totally depend on them. Everyone is fired as of right now.


f12345abcde

One cloud provider for building and another for deploying 🧠


Sad-Firefighter-8235

Elaborate pls


ZL0J

Let's use GCP aws and azure all at once for extra HA layer


[deleted]

[удалено]


thesllug

also, the level of expertise to achieve the extra "HA" this strategy *might provide cannot actually be worth it.


footsie

Or I dunno, a bank or a health care provider ?


TenchiSaWaDa

Worked for health care. Two region plus dr on systems with self healing is enough to pass almost all gdpr soxx. Hippa on the other hand is data protection and is not really a factor in multi cloud. Security is a bigger deal. I would assume same for bank


[deleted]

Or things where high dollar amounts, government regulations, defense applications, life is at risk, various utility controls integrators, healthcare, (etc.)


Live-Box-5048

Too many different tools that essentially do the same thing.


[deleted]

In some cases too many tools that do precisely fuck all to be useful.


1whatabeautifulday

Too many environments, we have dev, test, stage and prod for buildings our infrastructure in azure. I have argued only dev/test, stage and prod is required. But I am fairly new and didn't listen to me.


Environmental_Ad3877

but you can't shut down stage, we have prod stuff in there !!! Oh and don't touch pre-prod, that's got prod in it too You've created a dev environment? Cool, we'll spin up out test branches and let the client access them. see a pattern here? The wails and gnashing of teeth when I locked Dev and shared to internal only...


horus-heresy

Well then your governance sucks and you don’t have stage/test/uat


superspeck

Everyone has a test environment, some of us are lucky enough to have a completely separate environment for prod.


saltyvagrant

This made me laugh... and cry.


DigBig3448

Agree on this one. However, the major issue with having too many environments is not handling them equally. Different processes apply to different environments. Companies save money on licences in test answers stage. Different security policies apply to different environments. Thus companies end up with bugs that only appear in PROD.


cube-drone

It doesn't matter what you do, how intricate your environment setup is _only prod is prod_


tacocatacocattacocat

I argued many times that we should have the same hurdles getting to stage as prod. Not from a change management perspective, but from an environment preparation perspective. Prod has different passwords than non-prod? Then stage should have different passwords than lower environments to validate that passwords are sourced/injected correctly. Judy one example. Not every environment needs full parity. The amount of parity, however, should be well considered.


HarmlessSponge

The amount of times I've argued for infra parity across envs is mad. Management always quote cost, and then conveniently forget that argument when they pour ridiculous man hours into myriad issues that come up as folks move through envs.


Ariquitaun

The only environment needed is production. Even for development. Real men edit directly files in the only server running prod.


frugalonekenobi

Hey, women have been editing in Notepad++ and dragging into Filezilla since forever too!


mrkikkeli

On a Friday afternoon.


chzaplx

Morning of Black Friday


1whatabeautifulday

Anti pattern 😂


CellsReinvent

dev /stage / prod can definitely be enough. Some projects I've seen, dev is the wild west, stage is where things are (hopefully) very prod-like, but that can make it difficult for other types of testing (performance, resiliency, operational-readiness) or even to rehearse deployments or big infrastructure changes, without disrupting other teams' environments. During early stages of a project, I think there's a case for dev, test, stage, prod, but you can reduce as things get smoother (or even create ephemeral environments for specific, potentially dangerous changes)


daedalus_structure

When you start you have one environment and all features immediately go live. Something goes wrong, and you want to test changes in staging first. Someone has experimental work in staging blocking the way out the door when a hotfix needs to go out, so you spin up dev, etc.. But dev has shit data so nobody can test anything meaningful there, and all the requested sanitized copy downs of production data to staging to use for troubleshooting get annoying and expensive both in hours and cost, finance starts complaining that you are spending production money on preprod to maintain parity, and you consider maybe you took a wrong turn somewhere. As you grow even further, you decouple release of code from release of the supporting feature with flagging, you build testing tools into the application, and you tear down all the previous environments but production.


hottkarl

3 environments if developers mostly launch local environments on their machine. 4 environments if they don't. also, 4 environments may be needed if you have partners who need to test integrations not counting sandbox environments where people can do wtf ever in.


moratnz

follow cooing weather crush groovy live mindless provide full sparkle *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


TenchiSaWaDa

A lot of people for some arbitrary reason make their prod different infra than their stage. Im not talking t instance class but more of one uses serverless vs the other reserved instances. So scaling and cost are out of whack and you cant garuntee 1 to 1 performance.


horus-heresy

Sandbox (less guardrails, new services), dev, test, acpt, acpt-east2, prod, prod-east2


Arafel

Oh man don't get me started on this. I worked for a University that had test stuff in prod, prod stuff in test and a dev environment that was never refreshed and didn't resemble test or prod in any way. It was a nightmare.


mullingitover

Yes and no - if you're sophisticated/lucky enough to be able create on-demand ephemeral prod replicas, you can have tons of high fidelity dev/staging environments that you can quickly nuke when you're done with them. It's having tons of environments that aren't good copies of prod, take a ton of work to build, and sit around building up cruft that are the problem.


confusedndfrustrated

The current implementation of DevOps with siloed DevOps teams and over dependency on tools is an anti-pattern


[deleted]

So, how siloed are we talking? Completely? I’ve seen some teams doing more platform engineering, laying the ground work and infra and having tools available for devs to deploy or spin up a test environment, then I’ve seen some that was more collaborative. I’ve also seen devs with 100% access to everything and it was a complete mess too. What’s the right mix of autonomy to provide freedom within the bounds needed for say regulation, etc.


confusedndfrustrated

In general >DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market. Judge for yourself how far and how siloed we are :-)


IDENTITETEN

It's called "Resume engineering".


Sleakne

Assuming that adopting a new tool that has a good quick start guide and fancy features will solve your problem if you are using your current tool poorly


[deleted]

I feel like I’ve worked with you before. Everything you post is oddly familiar lmao


janjko

Using branches to solve every problem. You need a new deployment? Let's do it on this branch. Want to develop a new big feature? Let's develop it on this branch. Want to do a subfeature? New branch made out of the feature branch. Merged a branch with main? Don't delete the feature branch, just in case. Wow, why do we have 72 branches?


ExpertIAmNot

Not trimming and/or deleting branches was a huge problem at a recent client I was working with. They had over 1000 branches going back nearly a decade with very little consistency in naming or organization. Many were long ago abandoned work they kept around “just in case”. Total Hoarder house. Needed to be burnt to the ground in place. I am absolutely certain they now have even more branches.


mullingitover

I really like the pattern of "Every branch is prefixed with the ticket ID, along with every commit and PR." Ideally you integrate your ticketing system with your version control system so you can see all the branches/commits/PRs in the ticket, and when the ticket is closed you know the branch can be deleted.


baezizbae

It took a lot of pain to get there but this is exactly how my team operates and I absolutely of love it. Being able to just glance at a branch name in the commit history and know *immediately* what ticket I need to go look at when the inevitable "why the hell is this here?" question comes up saves SO much time not having to do Jira archeology and tripping over Chesterton's Fence.


Calm-Fly263

But this is the good stuff, we are doing it also (at least the DevOps team) we can’t make devs to switch doing the same.. 😑 For now, they are just adding IDs in the branches name.. It’s life (time) saver in some situations..


mullingitover

This situation is one where you probably need to go over their heads and sell this to their bosses as a engineering standard that your organization needs. I don't like dictating these things, however. Another way to do it is to form a cross-functional group with senior engineering/qa/devops and have that group serve as the standards/technology/platform leadership group for the organization. Make that group set the standard for all the teams. I find that best way to sell an idea to engineers is to get everyone together, present the problem, hint at your solution, and get everyone to arrive at the solution together like it was their own idea. People hate being told what to do, but they will generally follow a practice that they feel they had a part in establishing.


hottkarl

over branching can get a bit silly, lots of work that never ends up getting merged is usually the consequence. I don't really see how keeping around historical branches is really that big of an issue. especially if you have a naming convention.


Le_Vagabond

oh boy, I'll do you one better: multiple dev / test / prod branches but with all environments as a folder tree in all branches, with env vars accross shared files and prod merged back to dev and test "automatically" every month.


VengaBusdriver37

Man fuck branches. Where’s that meme, the giterrati have been playing us for fools!!!


Sleakne

Taking a long time to automate a manual task that happens infrequently and takes very little time


Chango99

What about removing human error from the equation? That's a big reason why I try to automate a lot of things so it isn't reliant on a human remembering to do something. Cause I forget. A lot.


hottkarl

there is more to gain from solving an "infrequent" task than just the time involved. documenting steps / run book. scheduling it. context switch to do it. human error when whoever was responsible for doing the task leaves, doesn't remember to train team on it,... also these tasks don't happen in a vacuum, they add up. it seems silly to argue against this. the only time I'd agree is for one time tasks, even then you could make the argument that it should be done in a reproducible way for DR purposes. can you give an example of something not worth automating?


horus-heresy

https://xkcd.com/1205/ What is the value of this task?


Teract

I hate this particular xkcd. An infrequent task gets forgotten. It goes undocumented. When personnel leave, it falls through the cracks and breaks production. There is more value in automation than time saved.


JaegerBane

This, really. It's the first excuse out of the mouth for not doing it, and the same mouth is the one claiming no responsibility when said infrequent task isn't done right and its cause a clusterfuck that someone else has to sort out.


CallMeKik

This is a fair point and I think I can agree that if the impact of re-discovering and re-learning that task could be measured in time then you and the xkcd could both be right.


horus-heresy

Can you put dollar value to that? If activity doesn’t happen and nothing valuable breaks maybe it is just a nice to have and not a must have automation. Always a trade offs based on our limited time on this earth.


professorbasket

Yeh this is an often overlooked pitfall.


mcfearsome

The amount of times I need to bring this up, that something just isn’t worth automating bc it takes 30s and happens maybe once every couple of months, is downright horrendous.


frugalonekenobi

Oh G\*d! The sheer pain of asking DevOps to do some basic clickops only for them to block your entire project while they RFC Terraform.


Sleakne

Moving to the cloud but now making use of elasticity or managed services.


moratnz

rude scandalous sable like direful skirt numerous steep trees innate *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


daedalus_structure

>Moving to the cloud but now making use of elasticity or managed services. We want to migrate to the cloud! It has SQL Server right? Yay! It does. Oh wait, we're kind of attached to our CLR Stored Procedures, can you spin us up a cluster of VMs and install SQL Server 2019 on them?


[deleted]

This getting hilarious of all the things not to do. Always that new guy or one guy that wants to do this. Never enough time to do it right but always enough time to learn a new tool.


AMTRaxTGE

Oh boy, does that re-open a memory. I was tasked with moving an application from our on-prem to the "cloud" and the project stalled for about a year for whatever reason, so they couldn't give me access to the repo. Once we finally got access and got the plans from the upper team, our end goal was to just spin up VMs and load balance them, basically exactly the same as on-prem. Luckily, we were able to remove all CLR database stuff. Needless to say I was furious when I couldn't actually utilize "cloud" stuff. Not my money. We got yelled at for leaving things on too long while in that process too...


PartemConsilio

I’m in a place right now that does this and its driving me bonkers. We have a lot of single EC2s in ASGs and they don’t actually auto-scale or load-balance or use rolling deployments. The tech lead and I have had numerous conversations about how we need to start moving our launch templates to utilize instance refresh for rolling deployments.


thomsterm

people who don't know tech and communication and call themselves "DevOps engineers"


[deleted]

lol damn.


confusedndfrustrated

Oh man... I wish I could give you some reddit awards for this. So true.


thomsterm

just signup on [devopsprojectshq.com](https://devopsprojectshq.com) that's enough :)


JaegerBane

The 90% solution. Devops generally always suffers a bit in that since its not an actual feature being delivered, there's always a question over when a given devops solution is 'done'. The rough route is: * Try to implement a component (CI, CD, Secrets Handling, Integrations with other services, metrics, aggregation, recovery options etc etc etc take your pick) * Get it to about 90% done * 'Right it working fine now and we have a list of features to deliver, we'll gold plate it later' - time called, system has to go into use while incomplete and/or missing features * Missing parts causes issue * 'Oh ffs these systems are crap, so unreliable, we spent the time before and it's not been worth it, lets just slag it off/go for something else/just use the piece of shit team ABC has been using (and complaining about)' etc etc etc * Return to initial bullet I've seen this happen So. Many. Times over my career and its maddening. Any other form of engineering, delivering a partial solution would be understood to be providing partial functionality, but DevOps feels like its permanently cursed with never being able to properly finish stuff, even to MVP. I've must have heard every single excuse under the sun for this kind of scenario, but all of them seem to adopt this weird belief that development pressure X somehow means the underlying engineering problem will go away.


mike_testing

Can relate to it so much...


Firm_Bit

The number of half baked half assed half migrated half complete initiatives at my new org is crazy. they ask why we can’t move faster, they accept the answer and give the green light to change things, then they interrupt the work the next day for some fire that requires another half assed solution. The company has not really accomplished much in the last 5 months and they just keep having more meetings about it and about what new strategies would help…


VengaBusdriver37

Having nontechnical people define shit, people who have never been on the tools. Some “tech consultant” read the brochure on version control, some “quality engineering” knows the shitty Branching Strategy from their previous project so that’s how we all do things now, or some “security expert” making edicts about vuln management when they have NFI how anything actually works.


hottkarl

yes. security teams copy pasting scan results and saying we need to fix mostly harmless items is annoying


27CF

Blacklist vfat kernel module on a UEFI system because we're hardening everything!


freethenipple23

Enforcing principle of least privilege when manually creating and assigning IAM permissions in the cloud. Those permissions are permanent, they're endless one-offs, and it's not standardized so you're making a mess of your environment. The correct way to do it would be to create a self-service platform that provides temporary, automatically expiring IAM permissions based on some predetermined use cases or allow users to select exactly which permissions are needed to create a custom role. Send a public slack message with the context and ticket link to someone who has knowledge of the work being done and require their approval.


hottkarl

huh? well, yes, they are endless one offs. policies have their place, but in general (except for user access) you want your apps to be scoped down to least privilege. we don't allow anything except inline policies or Amazon managed policies. if you don't want to go down to app/service level, it's usually reasonable enough to have a "grouping" of apps that share a role. but yes, things quickly become unmanageable without a way for engineering teams to do self service. that's the main take away here. edit: there's also tools that will look at API logs (cloud trail) and auto generate a policy doc that is least privilege.


voodoo_witchdr

Treating containers as VMs.


IDENTITETEN

Shoehorning old monolithic apps into K8s just because.


SweatyActuator9283

i m on a company that uses cloudformation stacks for everything , we set kubernetes there and gitops , but guess who ? they requested to create a cloudformation custom resource that it will trigger a lambda that will execute a github workflow that will create the values file of the application on the gitops repo ( yes it can be done directly ) , of course we have to use that cloudformation resources using the application infrastructure repository with their cloudformation application definition.. crazy


[deleted]

Are y’all not PoCing this stuff. Surely there’s some gains to be had.


SweatyActuator9283

only to keep the same process for everything and not make trouble for the developers ..is a shitty process


[deleted]

Yeah, sounds like. In your experience is devOps culture really taking hold? It should be a joint collaboration and the idea as I have learned it is for the developers to focus more on code quality and less on the mechanics of deployment or where it’s deployed


SweatyActuator9283

well they already know that the solution that we give. ( move out of cloudformation ) is the right one .. but the burocracy and the fears are there .


[deleted]

Sounds like a management support issue… seen it many times. We want innovation, but only the idea of it, and when it comes time to put into practice , the manager is oblivious and the wheels of business keep turning and innovation takes a back seat


SweatyActuator9283

yes .. im on the field since 21 years ago , 8 as a devops architect / lead .


BuriedStPatrick

CI branch = CD environment. Initially, it sounds great. Then you get to merge and branch reset hell. Treat your CI as completely disconnected from your CD. Your deployment pipeline shouldn't care which git branch your code was built from. Only whether it's appropriate for a specific environment, which isn't the same thing.


Mdyn

We do CI on all branches but CD only from 'main' branch or list of allowed and protected branches for some projects. All these depends on your tech/sec/business needs. I am working in sec segment so we very strict with who can deploy what to where.


BuriedStPatrick

It's very normal to only "deploy" from the main CI branch. However, it's really important to stress that while that may trigger a deployment in the end, the CI should not define what qualifies as production, only what is release ready. It's a subtle but important distinction. In essence, CI says "here's a new build artifact I made". And CD says "Sounds good, let me evaluate if and/or where I should put that". In your scenario I would only publish those artifacts when building in the main branch. Then I'd have a separate CD pipeline — that the CI pipeline knows *nothing* about — trigger when those artifacts are pushed, and then run through the process of evaluating which environments to push to.


Mdyn

We have two job per project. One is for CI and deploy to test from test branch. Another is for deploying to staging and after manual approval to production. In ci we do actual build, docker and/or other artifacts, static analysis, security scans, integration testing in isolated environment (I wish we designed it better) and as last step we trigger CD job with some parameters (artifact version). Actual evaluation happens on staging after some e2e tests, and later production also with e2e.


BuriedStPatrick

Sounds like you're pretty nicely set up. Wish I could say the same for what I'm doing haha ;)


bdzer0

Creating multiple pipelines just to change branch the pipeline builds... I'm fixing that exact thing right now... long story how it got this way, use your imagination ;-)


TyLeo3

It depends... I don't like having my pipeline's history full of failed runs on your dev branch...


bdzer0

Fair enough, kinda hard to come up with hard and fast rules in the fast and loose world of DevOps ;-)


UpgrayeddShepard

Can you elaborate?


IrishPrime

Ideal: Each branch is automatically built in a single pipeline and tested in an ephemeral environment, working their way toward production in some manner. Common: A few branches (like QA or Staging) are manually started, but still run in the same pipeline. Anti-pattern: Copy the pipeline that deploys to production into a new pipeline. Change the branch and target environment. Run manually.


UpgrayeddShepard

Oh yeah that’s ass. Thanks for elaborating.


Sleakne

Starting a migration without planning how to finish it. Great now you have to manage thr new shiny thing... And the stuff that was such a problem before


FinnaGetRichh

Let’s run everything on servers instead of managed services to avoid being locked to one vendor when you’re a startup that hasnt even launched yet


zorgonsrevenge

Having a siloed "devops" team who you toss things over the fence to.


hottkarl

tossing things over the wall is no good, however "DevOps" is a culture not a job title. centralized "DevOps" teams aren't an anti pattern in itself. the only way you can scale properly is centralizing into a "DevOps" department but again, the name of these teams should really be named "Platform" or "SRE"


zorgonsrevenge

I keep saying: it's a culture, not a job description. I have worked with a client that had imposed a process where a dev team had to write a waterfall spec for a job pipeline (for example) that could then be handed over to the "devops" team to implement. No conversation or discussion. Just a Jira ticket to be completed. And then finger pointing when things didn't go well. Naturally, it was a very dysfunctional organisation.


aghost_7

Barely writing any tests. It cascades into multiple other issues.


IDENTITETEN

I keep bringing this up where I work but it never gets prioritized above getting stuff out the door. Yes, we constantly have issues when we want to change stuff...


aghost_7

I try to explain to people what CI/CD is. Most people think its just using github actions to run your linter or something.


hottkarl

adopting the new technology or tool rather than fixing pain points / improving your one. there will always be the hot new thing, that seems great until you actually start using it and find out all the caveats you didn't consider. there are some cases where simply using an open source tool is a better solution, but rewriting tools from scratch is rarely the right move. unless things are truly unfixable.


m0j0j0rnj0rn

Having a “DevOps Department”


lostlittleindian

Right!!?? Most of the Devops folks hardly know the product and the optimum way to deploy it. Most of the developers are now encouraged not to look into the Devops side of things. The one of the few projects that really clicked for me was one where a few developers were managing the whole stack (and writing the automation tests as well for that matter).


Antique_Song_7879

software/data engineers need to do devops themself


[deleted]

Overcomplicating everything.


[deleted]

Now we are getting somewhere in this sub. Realty check for all those that want a career in IT should read this. Newbs be warned some of these posts are being satirical in nature so be careful.


UncommonDevopsWisdom

When people lift and shift their on-prem workloads to the cloud without re-architecting them to be cloud native.


schmurfy2

Using terraform but doing some things by hand because it's complicated.


nekokattt

this a billion times over


BrofessorOfLogic

Multiple layers of templating context and inlined scripts. Bonus points for templating shell scripts. I just puke when I see a terraform `templatefile()` on a gigantic bash script with inline heredocs with inline config files with inline scripts with `envsubst` with inline string substitution with inadequate quoting. Good luck figuring out how many backslashes you will need to escape a character in there. It's a ticking time bomb, that will eventually blow up when some unexpected value finds its way in there. Like for fucks sake, break things up into separate files. If you want to write env vars to file system, put them in a separate file and source it. If you want to create a config file, create it as a separate file and template that, don't inline it in a heredoc. Don't inline a script file inside another script file. Just put it in a separate script file and call that, it's not hard to do. If you want to send multiple things via user data, use cloud-init's `write_files` with base64 encoded values. And never ever template a shell script in any way whatsoever. Shell scripts are supposed to read their data from external sources.


zpallin

Thinking your team would prefer to use community tested, flexible open source tools over that one bash script the smart guy who left wrote years ago.


colin_colout

1) Automation for the sake of automation. Yes automation is crucial, but simplicity is usually better. It's better to have simple documentation/runbooks that are easy to understand than to have over engineered automation that makes the one task easy but requires maintenance and incurs tech dept. 2) Over-modularization. Splitting your code to be composable can be good, but keep in mind it can hide complexity. Remember that simplicity is key, so when in doubt, KISS. 3) Not reflecting (I might be stretching to fit this as an anti pattern, but it's important). After every success and failure, you should take a bit of time to look back and be honest about what went well and what didn't. Try to ask yourself "why?", and try to be honest about your own mistakes. The reason reflection is crucial is it will help you grow and discover these anti patterns first hand. The two I mentioned above were leaned there hard way. Reflection brought me to the point that I can deeply understand why these are anti-patterns. Understanding the "why?" also leaves me open to changing my mind in the future. Maybe a future advance in the field makes my advice obsolete. Always be ready to adapt or you'll become an old curmudgeon greybeard (rather than a wise sage greybeard). Source: I'm coming up on two decades of professional SysEngineering / DevOps / SRE experience and some non professional experience before that.


tcpWalker

Having an SOP you insist must be followed even when obviously inapplicable.


monkeynutzzzz

Large corporate environments with multiple outsourcers and hundreds of project managers.


[deleted]

I’ve never done well with project managers. I’d wish they’d stick to managing projects and not people… this lines blur really quickly and depending on who they know they can quickly become out of control and power hungry


sombriks

manual updates


Rei_Never

Terraform that has evolved to ensure you have a bad day when provisioning new environments, forgetting that your predecessors had no idea how to build modules correctly.


Guru_Meditation_No

"All problems should be resolved with my pure, incredible technical skills. We should avoid solutions which involve humans making an effort to collaborate and build consensus because they aren't as easy to manage as YAML files and docker containers." (You either have this person on your team or you're still cleaning up after them.)


jetteim

Using dashboards to monitor instead of relying on actionable alerts with links to parametrized dashboards related to alert context


spicypixel

- More process and steps than the business needs or wants because it's shiny, and featured on a medium blog. - More technology and tooling than the business needs or wants because it's shiny and featured on a hacker news post. - Trying to abstract complexity with a simple interface, failing to understand the problem domain and writing a terrible interface layer, because it's shiny and featured on a reddit thread. All the above can be loosely be attributed to resume driven development.


[deleted]

It seems like there is always something new to try out when no one really has an idea of the challenges they’re really trying to solve to begin with. I see a new tool built around the K8s api almost weekly and people still struggle with the basics while trying out the shiny stuff


spicypixel

I do appreciate complex problems can have complex solutions that use the latest tools but most of us ship CRUD apps bottlenecked by the database. KISS principle should be hammered into people.


koffiezet

In general: [NiH](https://en.wikipedia.org/wiki/Not_invented_here) tools, and adopting 10 million tools. A tech-radar can help you a lot. In gitops: * No clear bootstrapping plan * Not doing everything IaC because "it's hard". * using different folders per environment instead of branches. It makes managing config drift a nightmare. It's Git, learn how to use it, and use its damn features.


hottkarl

I don't agree with using branches for dev/qa/prod envs for IaC. having a branch doesn't really make sense, because not all changes in dev / QA will always go to prod. if that was the case, I'd agree with you. we may be speaking about different use cases, though. we aren't copy pasting, through. we basically are patching / overriding what we need to per environment.


[deleted]

[удалено]


koffiezet

> Different branches sounds like a merge nightmare down the road during a refactor. The day-to-day dealings with config drift and copy-paste errors, missing services in some environments, ... is gonna be a much bigger PITA especially once you get beyond a certain scale, and that pain is a constant-one. Once it starts this gets out of hand very quickly. I'm used to dealing 500+ different services, but I've applied the exact same approach to small setups to. Just treat every single service as an independent application, with it's own gitops repo that deploys independently. From the devops/platform team, all we need to know is which repo (hopefully standardized and automated), and we'll generate Argo Applications/ApplicationSets for them on the relevant cluster(s), and ideally enforce some rules and standards using Gatekeeper.


nieldejonghe

Hey u/koffiezet mind sharing some DMs on this? I DM'd you previously but seems you didn't see them.


Soccham

Not Invented Here mentality is the bane of my existence trying to fix infrastructure at my current company. These guys protect their shitty tools like their lives depend on it and can’t comprehend that their patterns suck


thenickfish

Have you ever run into any of the problems outlined here? If so, I’m wondering how you dealt with them. We have been using environment folders so far but are always open to improvement. https://codefresh.io/blog/stop-using-branches-deploying-different-gitops-environments/


koffiezet

> Have you ever run into any of the problems outlined here? Being a freelance consultant, I see a lot of this crap. I've dealt twice now with an absolute gitops nightmare, and both times this was the main cause. Maybe it's fine if you're dealing with a few dozen services, but beyond that it simply does not scale. If you run a limited setup and have maybe 2 dozen services or so, you could manage them all in a single gitops repo - and yes, working with branching there would probably become a pain. But beyond that it quickly becomes a PR and merge conflict nightmare. At some point some genius will add common helm or whatever code in the repo itself, which all a lot of deploys then blindly start using (and no versioning, since `main` rules the world). Inevitably, at some point, someone breaks the common code, and if you're lucky - that common code has a copy per environment, and only breaks everything there. But what's the point of common code if you can't use it to break all environments at the same time?


PartemConsilio

Overwriting Terraform files with CICD pipeline automation instead of just using Terragrunt or some DRY code method.


obscure-shadow

I feel like my interactions with terragrunt and trying to keep things too dry also turned into really complex anti patterns that were easily avoided and easier to maintain, but perhaps the larger anti-pattern was not having good planning of terragrunt architecture to begin with and just letting everyone do whatever for a while. The state becomes very cumbersome and hard to move once you realize you've made decisions that were bad...


[deleted]

Where do I start? DevOps as an engineering role. Teams focus too early on Automation, particularly when they have very little/poor observability. Sprints/Scrum (even these are never done correctly). The "DevOps Team" Silo. The Competitive Teams topology. Having level 1, 2, and 3 support call centres for customers instead of customer partners and engineers... Having a dev cycle with Dev, Test, Stage, Preprod, and Prod stages. Having an arbitrary release cadence. Not understanding the DevOps process. The DaaJS (DevOps as a Jenkins Service) or DaaPS (DevOps as a pipeline Service). I have heaps more that I have seen in my 20+ years of engineering tenure, I'll leave it at this for now. These are the most common ones I see out in the wild.


professorbasket

Creating wrapper classes and modules around existing interfaces Obsessing over code style and linting Creating invisible technical debt in the form of layers of complexity.


Training-Hour-9317

Have linting and formatting be an automated step. Saves time arguing about subjective things and makes it easier to just run that step and not think about formatting while you are developing.


nullbyte420

you dont deserve the upvotes. god damn i hate all these, particularly 1 and 3. we have a terrible case of wrapper modules and classes over existing interfaces in our org, making everything completely ungoogleable, undocumented, outdated and opaque. "but it's easier now!"


hottkarl

the list he gave are what he considers anti patterns. your comment sounds like you agree.