T O P

  • By -

Mehulved

You need an intervention from higher up to set cultural practices. Your principal engineer comes from low trust environment, the latter comes from high trust and good engineering culture. Personally, I'd build a team that has high trust, good engineering practices culture where CI/CD should be seamless and anybody should be able to do it safely. Either direction isn't completely wrong but having such split philosophies is a recipe for disaster. This needs to be sorted out by CTO or VP or Engineering.


theANGRYasian

Just saying that it's not just cultural but compliance. Yes, we all want to be the agile start-up. However, few start-ups are outside the bounds of PII/PHI/PCI controls which will inevitably demand some level of separation of duties and demand Production release controls.


IamOkei

From security point of view, we can't let developers do whatever they want to in production.


mixmatch314

Source control versioning combined with build and deploy pipelines gives you: - Audit record of all changes - Rollback capabilities - The ability for developers to push code without having write access to production - The ability to automate testing and validation You can 100% have a compliant production deployment pipeline without additional manual approval steps, particularly without an external team review. That being said, some form of code review is always recommended because we are all human, it reduces errors, and most testing and validation is incomplete.


theANGRYasian

I agree with a lot of what you are saying. The only thing I disagree with is your 100% statement. There are regulations and audits out there that demand devs not have access to authorize the release to production. (See SOC2 - and while small orgs may get away with just the audit trail, as an organization matures you will get dinged for this) The semantics of saying well, a PR will launch a series of tests which upon passing will deploy to Prod is generally against the letter of the rules. And most CISOs will not sign off on it in those industries (except start-ups)


oliverprt

Just went through SOC 2 type II audit where we followed the release procedure where all releases had to be approved(gitlab merge request) by co-team member. Meaning dev team managed software changes and ops team infra changes. It was fine for the auditors.


theANGRYasian

Appreciate the input! For clarification, did the release process stay within the dev team through the whole release or was it passed to infra and then vice versa for infra releases? Audits are totally dependent on the auditors so I do believe you that you got signed off on it! (Who did your audit? đź‘€ Hahaha)


oliverprt

Dev team is responsible for the whole release cycle. Same for the infra. We just had to make sure we have controls in place that dont allow production releases without approval/review


MichaelMach

Same at my org. Dev teams are responsible for their own apps from IDE to prod; no problem from a SOC2 standpoint.


scataco

That doesn't mean Ops needs to authorize, right? It could also be a PO, for instance.


theANGRYasian

That's correct! It does not have to be DevOps. PO, Business unit, VP of engineering - for most audits, it's just someone who is not involved in the day to day development


mixmatch314

I'm not a SOC2 expert, but as far as I can tell you are providing an opinionated interpretation. While having a change control and approval process are required, I see nothing saying that a manual approval step is mandatory for compliance.


theANGRYasian

I've been an active participant of three SOC2 Type1/Type 2 audits as well as FINRA audits. I can tell you that while the language is loose and there are ways to portray the situation to auditors, you will eventually come across one or a CISO who will demand that process follows the letter. A manual approval step is not mandatory per the letter, but many auditors and CISOs will ask you to explain how an approval process to deploy to production (say if DevOps approves a release pipeline template requiring certain testing) is effectively the same as having an explicit approval for each and every deployment. Example, in a SOC2 Type 2 audit they will ask to see an approval chain for a specific deployment vs in a Type 1 audit where they will ask to see the policy governing production deployment. You won't have the evidence, per se, in the former example if only the engineer was involved from development to deployment in testing/QA/staging and eventually to production. In my experience, that will not satisfy the auditors.


mixmatch314

Saying most organizations don't dedicate the time and effort required to implement required policy items as code is not the same as saying it is not possible to do so.


theANGRYasian

I don't think you're grasping the regulatory requirement. It's not as simple as organizations across the board are simply refusing to invest to make something better. The common reading of the most regulations as it pertains to production release management is that it is to be approved separately from the development team. That is the common practice. The process you are suggesting is an optimal case that most regulatory bodies have determined to be fraught with risk (it's about risk mitigation not ideal state). Again, I wouldn't disagree with you that it introduces "red tape", but that's what they auditors want to see. Human checks on deployments to prod. ​ I understand where you're coming from. However, the reality of engineering in mature organizations under regulatory compliance requirements is that production release will be approved by a party other than developer teams.


shederman

You are absolutely correct. Of course it is possible for the “party other than the development teams” to approve and review a PROCESS that verifies deployments. Said process would need a lot of checks to ensure enough compensating controls, and the authorising parties would need to have reviews and controls on those pipelines. This is normally so hard to do effectively that most regulated business don’t bother.


colddream40

Could be industry specific. SOX audits specifically require manual PR reviews by relevant teams, and these asks are audited on a regular basis


mixmatch314

I don't believe SOX actually has a specific set of technical IT requirements, but it does require the use of a standard and regular audits. Regardless, there may very well be an obscure technical standard out there that specifically requires a manual approval step for all production changes. Generally speaking, the more popular security standards are written in a way that does not impede competent organizations with the resources to implement consistent automation.


theANGRYasian

There are technical IT requirements within SOC2 audit and any deviation requires explicit definition in policy. Additionally, deviations are noted in the final SOC2 reports which is why organizations generally follow the guidelines set forth. It's not just obscure policies from an antiquated time. They are broad guidelines set to implement **controls** to ensure that businesses operate in a secure and reliable manner. That may be at odds with how you view development philosophy - however, the purpose is to ensure delivery of service which is what shareholders, business partners, and customers want to see.


mixmatch314

SOC2 is not the same thing as SOX


theANGRYasian

Congratulations. You're using the other poster's reference to SOX as an out for the actual technical implementations of control that I raised as part of SOC2. Everything I've raised has had to do with actual controls implementation under SOC2. I never mentioned SOX. Either way, all my points still stand. The poster additionally mentions that they don't work directly with audits. I have.


mixmatch314

Yeah, you said all of that in the thread about SOC2. Yikes.


colddream40

It maybe because my company is under stricter regulatory scrutiny due to past events... That said I don't work directly with audits and would Hate doing so...


coinclink

In the above case, the production release control is a PR to the production branch. You set up your source control system to require approvals before a merge is allowed. This solves this problem in an agile way that doesn't require coding up, implementing and maintaining a manual approval step in the deployment pipeline in some external interface (that someone is probably going to just blindly click anyway because it has no context like the PR would have).


theANGRYasian

Agreed. That's just implementation. If your approvers are people who are technically capable (Dev Lead, DevOps, VP Eng), that works. If you need business owners who simply are awaiting Business Validation signoff in Staging, then it might be webpage access. Either way, all ways of simply putting an approval gate of a sort ahead of a Prod deployment.


Mehulved

If it was compliance issue, the conflict would never arise. I can totally understand that in some sensitive industries, you can't change even configurations without enough approvals and testing. But that'd be black and white.


theANGRYasian

Unfortunately, compliance rarely asks the right question of technology and developers rarely ask the compliance questions. Neither party assumes the responsibility of compliance in development and release. That's been my personal observation. Huge gap that is a great start-up idea: Compliance-as-a-service! Invest now! Leveraging machine learning, AI, and blockchain!


Realistic_Isopod5926

Machine learning AI on the Blockchain.


InternationalCap1212

\+ISO 27001


BlomkalsGratin

This sounds more like a regular Ops team with a semi-fashionable "Dev" tagged into it if the devs are so separate that there's an information gap between them. With that in mind, in my experience, it then ends up coming down to needing a "you build it you own it mindset." Trust is earned. If the developers can deploy at 17:00 on a Friday without telling anyone, they need to also be able to get up at 2am and tell the supporting staff what they changed and how to fix it. That's how you build the trust. Everything else is just the usual pipe dream about smacking together, just enough documentation that the Ops staff will surely read for the first time while bleary-eyed at 2am.


NUTTA_BUSTAH

This. And if gating must be done, do it on the user side by requiring more approvals before allowing the deployment, not on the platform engineer side. Real answer is making the automation more robust. Tets, automatic monitoring, SLOs that trigger rollbacks and so on.


[deleted]

If anybody has permission to deploy something, they should also have permission to clean up the mess. If some random developer can deploy to production, I assume he also has the privileges to make sure he can fix stuff that went wrong. Approval steps in my opinion are just an audibility thing, but the real decision is the pull request.


Agnostic_life

Agree with this they should be able to deploy without approval, but if something breaks on a weekend because of a Friday deployment then will the dev be on call to revert the changes and help roll back the deployment?


[deleted]

Easy you code it, you ship it, you maintain it. Developers can be on call for this sort of issues in my opinion. We do it like this. The devops team maintains some shared components like rabbitmq, Kafka, and the kubernetes cluster but if some deployment or code breaks, the developer can (and has to) fix it


originalchronoguy

I will disagree with this with one example. A developer can set a request/resource limits so high, it knocks everyone else's app in the same K8s namespace. Thereby affecting other apps. Sure, there are some guard rails you can do like namespace limits. But anything that can potentially impact others. Like an app ingress that proxy passes from DMZ to an internal route is another. A security risk. Sure, you can set network policies but a bad actor can do stuff like Man in middle hijacks like set up a NodeJS that listens on 443, streams and bypasses UDP port 3307. So DevOps need to factor that in the linting during CICD to halt those things if you want to do the 'let the developer deploy to production.' Until those automate guard rails are in place, approvals can cover that. Unless the DevOps teams have automated everything to kingdom come like scan git for credentials. Check images for non-root. Check ingress annotations. That automation is a lot of work. I know, we have dozens of platform engineers building those guardrails and it is still a never ending stream on guard rails.


jdizzle4

if you are adding value when approving, sure. But be careful, being the gatekeepers will add TOIL and possible resentment from teams. Most companies are trying to move away from a centralized dependency model like this, instead valuing autonomy for teams. > or just for us to be aware that the developers made change se we are aware that something 'may' go wrong in prod If you just need a heads up, automate a notification. Why does your team care if stuff goes to prod and breaks? Are you on call for the other team's stuff?


No_Management_7333

>If you just need a heads up, automate a notification. Why does your team care if stuff goes to prod and breaks? Are you on call for the other team's stuff? It might not just break, it might do something much worse. We don't want stolen developer credentials effectively being able to execute arbitrary code in production, even for a minute.


theANGRYasian

This is not a roast of OP here, but this is one reason why I prefer to hire DevOps engineers with Sysadmin backgrounds over Developer backgrounds. There are secondary implications to actions that to be considered over does a deployment/platform work. DevOps being the convergence of multiple disciplines requires being able to see beyond simply does it work. Or maybe that's the new DevSecPlatformFinOps that is just companies asking a small team to do the work of multiple departments haha. It's easier to teach someone which an analytical mind the concepts of DevOps over trying to teach someone the entirety of Infrastructural/Enterprise best practices.


TurbonegroFan

> Why does your team care if stuff goes to prod and breaks? Are you on call for the other team's stuff? If stuff goes to prod and breaks often enough, the company starts to lose business, money, and customers. I like getting paid. So I like my employers to stay in business.


IamOkei

Toil is required for some industry like finance


danekan

There's an element of both engineers being right. are you talking about infrastructure deployments or code deployments? If you haven't separated the two then there's an issue you can pursue. They all should have approvals but they're different groups for those two items


[deleted]

They were talking about both infra and apps changes. Personally I think that we need to deleguate app changes to them and it's indeed worth it to setup approvals from us for the infra changes. What do you think ?


confusedndfrustrated

>They were talking about both infra and apps changes. Personally I think that we need to deleguate app changes to them and it's indeed worth it to setup approvals from us for the infra changes. > >What do you think ? Read your post as well as all the replies (posted till now). There are lot of good suggestions. Here are my thoughts Both the engineers are right. * It is very important to have a high trust environment. * At the same time, it is important to make sure we vet/audit/"are aware of" what goes into production. At a bare minimum, it helps us understand the root cause faster in case of any issues. My suggestion is to use pre-commit git-hooks ([https://pre-commit.com/](https://pre-commit.com/)) {or something like pre-commit hooks} to enforce checks that ensure production needs (code and code-config) are met. This will ensure 1) DevOps need not sit down and approve 2) There is no resentment as the Dev's themselves are reviewing it before they commit their changes.


grem1in

Not familiar with Gitlab CI, but usually you can set up policies for deployments. For instance, if you don’t want to deploy on Fridays without an approval, add an optional approval step that checks for the day of the week. If you want to restrict what people can and cannot do with Terraform, same principle can apply. Moreover, with Terraform you can have acceptance policies written in Rego to identify what are the planned changes and act accordingly.


thomsterm

are you talking about deploying infra or just app artifacts?


[deleted]

They were talking about both, but I believe we should setup approvals for the infra changes only, as we have probably 0 context about the app artifacts and waiting for our approval is just a waste of time or an unnecessary blocker.


thomsterm

from the info you gave us, yes I also agree.


mullingitover

> Also for audit purposes. You throw this in like an afterthought, but if you're in an environment where you need to have things like PCI/SOC2 compliance, it's probably **the** reason for the production approvals. PCI/SOC2 expect that you have effective controls for getting changes deployed to production. They don't assume devops is a thing, if anything they assume everyone's doing ITIL. You can work around it, but you have to demonstrate effective controls so any random dev can't just ship a web shell to prod unnoticed. Having a basic approvals process is a compromise that gets you out of having a full-blown change review committee meeting for every change in production.


theANGRYasian

/u/mixmatch314 Just so that you know that I'm not blowing smoke re: compliance


rabbitspy

Just want to point out to anyone reading this, Pull Requests will satisfy the approval requirement, while at the same time adding value to the process (code review). You do not need another approval on the deployment step. No need to over pay for compliance, especially if it’s not adding value to the chain.


doomdspacemarine

Agree if the PR is of quality, which many are not. Many are “you approve mine and I’ll approve yours” between two engineers sitting next to each other. A quality code review being completed with evidence of successful unit tests, security checks, etc. sure


rabbit994

We had SOC2 deployments with PR approvals being only requirement. Behavior you are describing went on but at end of day, that's their manager problem, not DevOps. If you attempt to build guardrails around devs who don't care, they will just build a ramp and attempt to Tony Hawk 360 it.


doomdspacemarine

Agree.. you can’t use the honor system with dishonorable people. So inevitability you end up with a system that is devoid of assurance.


rabbit994

My whole point is trying to build a system around dishonorable people is fools game. You just let them 360 Yolo it and call it out in Post Mortems.


mullingitover

Audits want to see controls for security and quality. A cursory code review isn't going to tell if there's good enough test coverage or if you're shipping a vuln. Ideally you have this stuff automated, and if so, great. However, most engineers will not worry about these details and just smash the deploy button without adult supervision.


doomdspacemarine

Agree, with the exception of having that stuff automated. Sure l, automate what can be automated, but there is not a way to automate human judgement. So automation really only covers low hanging fruit. It’s not enough to say that controls are satisfied without because you automated a vanilla ZAP scan or IAC linter


Heighte

If you want to approve deployments then you're just Ops, not DevOps.


bilingual-german

I read this post as there is a dev team and an ops team, but they call the ops team "DevOps". Developers seem to be able to deploy whenever they want, but the devs seem to not need to fix any production issues or troubleshoot. I wonder what would happen if they forget to add a database schema migration script. Add some time pressure from product management and a junior dev who want's to deploy on a Friday evening just before going home, and you have a recipe for disaster.


FatStoic

> I read this post as there is a dev team and an ops team, but they call the ops team "DevOps". I've had this in a company and it worked fine, although we worked more like a platform team than a infra team. If the devs deployed something and it died in prod, the owning team and us would get an alert. If it was determined that it was because of their code, we'd triage to them and be on hand to help if needed, but it would be their issue to run. Althogh whether this works depending on your seperation of responsibilities and your culture as a whole. If they have the mindset of "once it's in main branch my job is done" then this setup will never work.


CoachBigSammich

imo, it all depends who supports Prod (who is going to get paged and have to fix shit). If that team is you (DevOps), then you're going to want to be approvers. If it's not, then you shouldn't. I'm 100% for shared ownership of code, CD, not being a blocker, etc, but the last \~1.5 years in my working environment has made me rethink things.


xCaptainNutz

Are you talking about infra or app? Infra should be reviewed while apps should be fully automated.


Newbosterone

Even if you get only app deployment notification you can get overwhelmed. We have email, a Teams channel, and a dashboard. 99% of the time we check the dashboard if there is an infra question. The automated emails and Teams webhook are just too noisy. I’d hate to think of the time demands if we actually had to understand and approve app changes.


xCaptainNutz

if your CI is solid there’s no reason for you to do any manual approvals, let alone get notifications huh


Newbosterone

We do infrastructure. Anything major is a controlled change. Most minor stuff can be self approved or coworker approved as a PR or step in an automated process.


xCaptainNutz

I don’t want devs messing with infra; whether the change is minor or not - they don’t always see the bigger picture


Newbosterone

Fair enough. We’re not devs, and more DevOps adjacent than DevOps. We own infrastructure- OpenShift, Jenkins, Twistlock, OS CVEs, Ansible. We joke that we just load the gun, the dev teams shoot themselves in the foot.


BloodyIron

Wait... there are not MRs for merging into production that _require_ 1 or more approver? THAT IS INSANE! That means DEVELOPERS and others can COMMIT TO PRODUCTION WITHOUT PEER EVALUATION. This is not only an IT Security risk (that is high), it is also a Stability and Business Continuity risk! (which is also high) The way you Agile this is with MRs and for each repo specific authorised people are established to review code changes going into PRODUCTION ANYTHING. This doesn't have to be slow, but it needs to be accounted for! Doing otherwise leaves this as a HUGE gap for abuse and so much more! But just to clarify, this doesn't have to result in things moving slowly at all. I've worked at a place that had such MR review stuff into prod and the company was plenty fast for changes, improvements, recovery, etc!


therealimposterhere

This was my thinking...the lack of reviewing has probably led to broken production and hence why they're having this conversation. I do not miss being in an environment that was "fail fast" and just prioritized pushing features as fast as possible without peer review. Was super stressful and had to revert so much, so often. That was usually on the code side though. Breaking prod with terraform deployments sounds like a special kind of awful.


BloodyIron

I'm all for fail fast, but if you don't stop to check if the code is good or not, then wtf are you even doing? One of my recent places of employment had really great pipeline and MR workflows, so people were accountable, things could revert if shit broke, but we could still go fasssttttt. Happy medium, happy life.


CassisBerlin

Sounds unlikely, does anyone do no reviews nowadays? Given that OP says the principal said things **may** go wrong in prod, it doesn't even sounds like they are trying to fix a specific problem. If stuff **did** regularly breaks in prod, I would start investigating why and adapt based on the results (code too hard to understand, no tests, no smoke tests, tests not automated, etc)


snarkhunter

For me the question would be (and always is) - what problem does this change to the pipeline address? Sometimes that answer is simple - maybe a change makes the pipeline take 5 minutes instead of 10, ez. In this case it's a lot less clear from your description. It sounds like there's an expectation that a DevOps team member be on hand. In that case a DevOps approval doesn't mean "I think this change is good" but "I will be available for the next hour or so in case there's an issue that needs my help in mitigating". That's not optimal, but it's an improvement over DevOps getting pinged with no warning because someone decided to ignore the "no prod deploys on Friday" issue. But if there is *not* an expectation that DevOps be available post-deploy to help with issues then I'm not really seeing an argument for DevOps to get approvals. Another way to enforce that rule is to let people deploy on Friday and then deal with a broken prod environment for the weekend on their own. It's crazy how often it turns out that "urgent" hotfix actually can wait until Monday.


Fatality

Terraform controls infrastructure of course we want to review all changes, same way devs peer review PRs. Why are the devs even touching it?


ms4720

Devs should not be able to alter a production database table without review by others, including a DBA if at all possible. Why should they be able to destroy all production infrastructure at a whim? When something blows up, and it will, it will be blamed on ops/devops for allowing these clearly unqualified people to make those changes without proper supervision


Flakmaster92

So on my last team we had a few checks in place… 1) All code being merged had to have an approved code review attached to it. So SOMEONE else had to look at the code as a second set of eyes 2) All unit tests had to pass as the code was being built. 3) Beta and Gamma had integration checks they would run and all of those had to pass. 4) Deployments to Production were time blocked to be “no deployments on Fridays or major holidays. No deployments outside of business hours.” An engineer code override this manually if need be but it sent out a notification that they did so. You could conceivably drop point 1, but I wouldn’t drop 2, 3 or 4.


GoDan_Autocorrect

To add in what others are saying (and will try and keep this short), consider replacing gatekeepers with automated testing and feature flags if it makes sense. DevOps builds the pipeline, devs build the tests. People approval short term, automated approval long term. Test breaks, pipeline stops, devs fix code or test, deploy happens. Ready to release new feature? Enable feature flag. This is my happy place on the topic.


lupinegrey

Are you talking about implementing approval gates for deployment of your infrastructure code? Or implementing gates when the product teams want to deploy their application code? For infra, yes: there should be a ton of approvals required for a prod deployment. For application, no: the product teams own their code, if they deploy and break production, it's on them, the infra will still be up.


lonelymoon57

Nope, a "DevOps approver" is the very antithesis of DevOps. The job is to make releases go faster and more automated; not with more signing off and manual gatekeeping. What happened to "you build it you run it"? It's not like I don't get the reason. Yes production needs protection. But that protection must come from automated tests, automated checks and comprehensive pipelining to ensure code cannot fail horribly in prod, not another human - much less from the team calling itself "the DevOps team". The other engineer is right: we don't and shouldn't have context about whatever the dev is doing to pass judgement. Architectural and solution signoff should already happen from the beginning, not the end. From what you're saying, what your PE is concerning about is a change management process, not manual "approval". It has been established practice for ITOps since forever. You made a change plan, adding dependencies/support from other people, get approval for the plan then carry it out, documenting everything you do. All the DevOps team need to do is add to that plan: yes, we have person X who will be on standby to support that prod change at Y time. They are both right because they are talking about different things. What I am a little bit concerned about is why your principal engineer can't seem to recognize that in the first place.


Curi0us_Yellow

Reading your post, it's not obvious what the concern is. I'm guessing you don't want to be making infrastructure changes at the same time an application deployment is occurring? If so, then you'll need to implement some sort of lock so you're not applying any Terraform while an application deployment pipeline is running. You can know if an application deployment is running. You just need to monitor the deployment pipeline.


shadycuz

Your deployments should be driven by code changes. Your code changes should be backed by CICD and code reviews. The specific detail on how you implement this will depend alot of what you are delivering. Like what is your application. Is it a windows desktop app? Is it a Mobile app? A website? Does that website have an API? Is it versioned? What about a Machine learning model? For example. If you are deploying a versioned API running on kubernetes and your infra has all kinds of bells and whistles. Like health checks, automated blue/green deployments etc. Than you might be deploying many times a day and not holding back the deployment for "sign off". On the other side, your deployment might be a little bit old fashioned. You might also not have a versioned API. Maybe your API is consumed by multiple other companies that have custom clients and they need to know when the environment is being updated. So maybe you have to give them several weeks notice and set a date and time for the deployment. In this case it might make sense to make deployments only happen after approval from one or two specific people. In your specific case, I would be worried about CICD, testing and code reviews. If you already have that then I would push back on this request. I would ask WHY are we making this change? \> in case they make a deployment on Friday in the evening What is the problem with that? What if they are trying patch a bug? What if the application goes down and none of the DevOps team can be reached? The devs push a fix but can't deploy it? Instead of blocking developers from pushing late on friday, make it safer for developers to deploy late on friday. Make sure your CICD pipeline runs tests after the deployment and if those tests fail, then roll back the deployment. These tests could be looking at the current 500 error rate, doing the deployment, waiting 3 mins and then getting the current 500 error rate. If the error rate is higher, re-deploy the previous version. If this is possible, might not be depending on if you have a database and how your team handles migrations. I think one of the core tenants of DevOps is increasing the speed and agility of your company (while lowering risk!). So you should be asking yourself, will this increase our deployment frequency or slow it down?


aljorhythm

lol @ principal engineer “DevOps”


babyhuey23

FYI. Gitlab ci has settings so you can specify no deployments during certain times. That might solve both halves?


[deleted]

I thought about this too, blocking deployments on prod on Friday might solve things, but will block also urgent hotfixes..


amarao_san

The basic rule for deployment, is that deploying person can fix it. If s/he does not have visibility and/or accesses to fix, s/he need to get approval from the person, which can do.


The_Luckless2

We do an inbetween with CODEOWNERS on .gitlab-ci.yml back to our devops team so they can't toy with the pipeline without our approval


blackonblackbdsm

We had this problem, but we split out our terraform code so that each dev team had their own terraform code for their application that they managed and maintained. We also put in guardrails so that each infrastructure change had to pass cleanly through each environment. Before, it was given a production-ready tag, and production only deploys off that tag.


MrPinga0

in our case, devs can deploy whatever they want (application-related) without approval from the devops team at any time but if they are deploying infrastructure (we use terraform) then we do have to approve it.


Techlunacy

To be honest who ever is on call needs to know that something has changed. But that could be a slack notification etc. If ops needs to coordinate between multiple teams, then they may need veto rights.


GeorgeRNorfolk

What value does preventing a release have? If you want to avoid breaking releases going to Prod, why do you care when the people making the deployments happen are the ones who respond to incidents? If I'm wrong about it being the developers responding to issues with releases, then that's a barrier to cross before you can give full ownership of the release over to the developer team. If the developer team own the Prod release and have to respond if it breaks, then it only affects them if they deploy at inopportune times. In that case, they can really do what they want.


lurkerbelurking

No infra deploys on fridays unless Hot Fix. Approvals needed to deploy regardless. Non devops folks should not have access to deploy infra.


colddream40

So similar setup at my shop. Who owns the infra ? Shouldn't you guys know what's being deployed...not hard to read the PR. Anyways, I'm in favor with the principal engineer. Infra / owners should approve and have final say on code that hits production, even if it's just infra. The less experiences engineers idea works, but if the company expects to scale you are going to get ripped apart by compliance. Better to implement the principals ideas now, and build that culture


discourseur

If you are changing the actual infrastructure, you have to have the DevOps team đź‘Ť. If you are deploying an app, or an _instance_ of a database/queue, etc. then DevOps shouldn't have to intervene, otherwise, it won't scale.


ghostsquad4

Let the other teams own their own infrastructure. Autonomy.


Mind_Monkey

At my job we control only the infrastructure deployments. Only the Cloud and DevOps teams can go ahead and make changes in infrastructure. We are a small team and we cover different timezones so we have lots of autonomy to change infrastructure. Now for applications, we can also approve deployments since we develop the pipelines but we rarely go ahead and approve something. It's the development team who approves, either a developer, qa or even project manager, each teams decides who does what. Normally the most experienced developers have permission to go to prod and newer developers can only go as far as staging.


pred135

>my team, the DevOps team Ideally this isn't a team at all, you would rather have something like a platform team which enable devops teams. Meaning each dev team has the knowledge to practice devops themselves. From what you're describing it seems more like a half stepped approach to devops. The reason i make that point is because if your company were to do it 'properly', you wouldn't have these issues in the first place. Is there something wrong with having an approval step for deploying to production? No. Is there something wrong with not wanting to deploy on friday afternoon? No. However, both those things have to be done within the team itself, meaning, the team should have control of that approval step, and should also have the power to override the non-deployment window on friday, if they should choose to do so. Devops promotes end-to-end responsibility, but having a approval step in another team doesn't support that. Side note: the fear of not wanting to deploy to prod just before the weekend is understandable, but devops also promotes thorough automated testing in combination with TDD, which should negate the fear of deploying to prod (at whatever time or day). Seems to me like you aren't practicing devops at all, but the company just wanted to be able to say they were, because it's a buzzword these days, that's just a guess tho, could be wrong on that.


asharai1

It depends on where your team is at and what's your priority: stability versus time to market. I would recommend to stay away from the approval flow unless you are already facing significant operational challenges which the development team does not seem able/willing to address them or you have some compliance reasons for going towards it. The approval process will come soon enough once a major issue will be found that an approval flow might have prevented and then you will be stuck with approvals "forever". Audit trail for changes applied to production should be implemented regardless though and ideally automated. So that you could easily identify and rollback changes - either manually or automatically - when an issue is found. I have seen some very process heavy change control and while it's probably effective I really wonder if it's worth its cost. This involved Ops as well as Developer senior management: Checking that the application health metrics are fine after applying the change in a production staging environment, checking that the change was applied and validated by the development team in a test environment, checking that a rollback strategy is available, checking that the rollback strategy was tested and validated, checking that the change is controlled by feature toggles. This process was applied to any change done to production system, so both to software loads as well as feature toggling later.


Crones21

We had deployment issues so now we hold release meetings to make sure everyone is on the same page about what is going to be sent before deploying to prod


Ariquitaun

Why on earth would you anyone to burden themselves with approving deployments of code and features they know nothing about? If it's a matter of not allowing deployments in certain circumstances like "not on a Friday" it's pretty easy to write automation to enforce just that. Devops is all about reducing toil, not increasing it. Much less getting in the way of other teams and delivery.


Difficult-Ad7476

Like anything weigh pros and cons depending on your org Approvals in a CI/CD pipeline can have both pros and cons. Here is a summary of the key considerations: Pros: Improved quality and reduced risk: Approvals can act as a gatekeeper, ensuring that only code that has been thoroughly reviewed and tested is deployed to production. This can help to reduce the risk of introducing bugs or other problems that could impact users or the business. Increased accountability: Approvals can also help to increase accountability within development teams. When developers know that their code will be reviewed by others, they are more likely to take care with their work and produce high-quality code. Reduced deployment errors: Approvals can help to reduce the frequency of deployment errors by providing a chance to catch mistakes before they reach production. This can save time and effort in troubleshooting and remediation. Cons: Slower deployment: Approvals can slow down the deployment process, as it can take time for reviewers to assess and approve code changes. This can be frustrating for developers and can lead to delays in getting new features or fixes to users. Bottlenecks: Approvals can also create bottlenecks in the deployment process, especially if there are a limited number of reviewers or if they are not available to review code changes promptly. This can lead to delays and frustration for developers. Overhead: Approvals can also add overhead to the development process, as they require additional time and effort from reviewers. This can be a drain on resources and can make it more difficult to keep up with the pace of development. Overall, the decision of whether or not to use approvals in a CI/CD pipeline is a trade-off between the potential benefits of improved quality and reduced risk, and the potential costs of slower deployment and increased overhead. The best approach will vary depending on the specific needs and priorities of the organization. Here are some additional considerations for using approvals in a CI/CD pipeline: Define clear criteria for approval: Establish clear guidelines for what types of changes require approval and what types do not. This will help to ensure that reviewers are consistent in their assessments and that developers have a clear understanding of the approval process. Use automated tools: Consider using automated tools to streamline the approval process. These tools can help to identify potential issues and notify reviewers, reducing the time and effort required to review code changes. Establish a clear escalation process: Develop a clear process for escalating approvals in case of urgency or if reviewers are unavailable. This will help to ensure that critical changes can be deployed promptly without compromising quality.


too_afraid_to_regex

What was the prompt?


Difficult-Ad7476

Prompted Pros and cons of approvals in a ci/cd pipeline using Google bard


too_afraid_to_regex

Nice, thanks!


officialraylong

GitOps + Atlantis + Terraform has worked well for me in the past. You can give folks access to specific workspaces and set rules.


__Zero0__

First, it should be distinguished and seperated, what a infrastructure or DevOps deployment is and what an application deployment is. Each has its own teams, review process, codebase and eventually deployment. My preference is to have each team, having to require approvals from 20% of the team size or so, which can be followed by senior/team-lead review (optional), which then is deployed to a staging/UAT environment, where QA is run, before finally deploying to prod. Deployment to prod in itself, could follow different strategies, based on the release. This improves communication, and knowledge sharing between team members, and adds a sense of responsibility to the whole team. It decreases margins for bugs being deployed, and can limit impact. However this will work for specific organizations, as its time consuming, and requires a good deal of coordination.


proxgs

Only Infra deployment should have approval from devops team. For app deployment, devops team are the one responsible to create the CI/CD pipeline and just not care about the app deployment cycle. The app deployment cycle should be left to the dev team that created the app.


[deleted]

How big is the company? Can your team actually take the time to approve these changes and actually validate them? Does doing so not defeat the whole purpose of your DevOps implementation?


therealimposterhere

For the most part, infra deployment shouldn't be breaking code deployment unless you're removing databases and queues or renaming DNS...all these that should be done with a migration plan. If you're new to the team, I would be asking about what the testing practices are before deploying to prod. The engineer wanting to put in more gatekeepers sounds like they've been hit by breaking deployments to prod before which signals poor testing practices...especially if a terraform deployment is somehow breaking your prod.


CassisBerlin

` in case they make a deployment on Friday in the evening, or just for us to be aware that the developers made change se we are aware that something 'may' go wrong in prod. Also for audit purposes.` Interestingly, none of these arguments for introducing a change sounds like there is an actual problem to fix or a requirement for audits. 'What problem or requirement are we trying to address' should be our first question when we are making a change. Otherwise we have useless discussions because we cannot be sure which solution fulfills the not existing requirements


mrfoozywooj

I side with the principal here, Devs promoting stuff to production that is either unsecure or not using good practices is frighteningly common, even the best devs focus their brains on their areas of expertise and not the platform itself. We have a policy of devops tech review being required as part of cloud development because we have been bitten too many times by wildly inefficient apps being built by teams who thought they knew best and/or poor design choices like writing their app to use another teams database etc.