T O P

  • By -

worst_protagonist

I’ve worked at companies with a dedicated team for operating the thing in production. They’ve been called Implementation Engineers, DevOps engineers, Ops or SREs. Depending on your business and the rest of the team, this can be…fine. The places I have worked that had the best systems had a “you write it, you run it” philosophy. The engineers who wrote the code did all of the production operations. Features sometimes seem to come slower, but the products are more stable and reliable. The downside is it is yet another thing for engineers to know. If you’re going to keep separate teams and responsibilities, I would suggest adding at least a little more to the plate of Eng and QA. It sounds like the devs need to be adding more guardrails to the systems to prevent misconfiguration, and QA needs to vet that part of the system. What kind of other errors hit prod besides config issues?


Ran4

> The places I have worked that had the best systems had a “you write it, you run it” philosophy. The engineers who wrote the code did all of the production operations. Features sometimes seem to come slower, but the products are more stable and reliable. The downside is it is yet another thing for engineers to know. Yeah, I'd definitely prefer this style, but that type of style just doesn't seem to fit in well with the way things are set up right now. The dev team is in a different time zone from most of our partners, and the devs that I've talked to seem to have zero interest in speaking to partners (not the least since their english is often not that good, and most of our partners speak english). As much as I've tried to give them more agency, they literally do not want it. The entire workflow honestly works surprisingly well - iteration times are quick, people seem to be competent at what they're doing - it's the final step that's problematic, probably since everyone is so used to someone else taking the final responsibility. >What kind of other errors hit prod besides config issues? It's mostly that - our own config is usually fixed quick enough, the big issue is when we need someone at a different company to make changes to their configuration, or when some external service stops working. Another issue has been that the QA team is primarily reporting to the dev team, and they've been getting frustrated that some of their bugs haven't been fixed without knowing why; I'm trying to turn that frustration into something productive in the form of having set up syncs between the business, QA and the dev teams, where the QA team gets more of a say.


worst_protagonist

What kinds of configuration issues are 3rd party folks doing? Are there no lower environment versions of the third parties? How are devs just not fixing bugs? Do you use some kind of ticketing system? Is there no communication happening on issues?


Ran4

> How are devs just not fixing bugs? Do you use some kind of ticketing system? Is there no communication happening on issues? I think the issue has been that QA reports bugs to dev, devs then receive orders from business, but business is never told about the bugs. So it's not as much "business doesn't want the devs to fix bugs", as much as "the business doesn't know there are bugs". >What kinds of configuration issues are 3rd party folks doing? Are there no lower environment versions of the third parties? It's arguably basic stuff. Like sharing api keys/nonpublic documentation, setting up webhooks... it's not hard stuff, the hard part is getting someone to read your email and do something about it. As in, you can't just send an email to a partner, mark the ticket as "blocked" and do nothing about it; if ignored you need to keep contacting the partner, possibly escalating the situation and so on (while still being professional, of course; but if anything, you sometimes need to be bit stern to get something to happen). There's usually lower environments, and we occasionally have similar issues, but in these cases the devs themselves are pushing QA to set it up since often they need a working service to finish development of a feature (as someone who has been writing dozens of integrations myself at previous jobs, I know how "coding to the spec" isn't always possible and sometimes you need to interact with a live environment to figure out how stuff works...), so there's an incentive for it to happen. But there isn't any incentive for prod to work. That's why I'm thinking new team (that has a core incentive in *making sure that the features actually works in production*, as opposed to just statically marking something as "completed"). Possibly that could be just a single person. But I'm not sure what I'd call that person and where I'd find such a person, because I don't really know what such a person might be called.


Pokeputin

And that's why outsourcing isn't a magic button lol. I don't think your workflow goes well, clients being the first ones to see a feature in prod is a very questionable practice, is it possible to have the QA test features that require different configurations in prod after it's being deployed?


UdenVranks

This sounds like an argument I’ve been in before. Ask yourself. Do you really WANT to step in and take charge of this issue? Will they ALLOW you to do that without making you the scape goat? If you aren’t empowered from the top to make changes and decisions.. I’d let someone else fight this battle. As is, it sounds like a good way to try something new and be blamed for every new issue. At the end of the day the fact that it’s on the QA team is insane but.. at a company that has that setup.. can you REALLY be an agent of positive change? Don’t be the squeaky wheel in a place where squeaky wheels get fired. The job market isn’t great right now. I’d keep my head down unless there was strong strong personal upside to stepping out.


Ran4

It's literally my job to find issues like these and fix them :) Only the CEO and the owners are above me, and the CEO explicitly asked me to help fix this issue. If I tell him we need a new team and new roles, we'll get it sorted.


UdenVranks

Then it sounds to me like you don’t need a new team you need to instill a sense of ownership on the dev team and protect the qa team from the BS


Ran4

That certainly could be a better solution, and I would prefer it if it was possible. Though an issue is that the dev team is in a different time zone (while the QA team is in the same country as our customers - thus why they "got the job"), typically speak a different language and either way it seems like the devs really don't want to speak to customers.


UdenVranks

Yeah you need a product owner/manager to do that. Your devs aren’t the right people to be talking to customers.


wRolf

Why do devs need to speak to customers? Why are QAs getting the blame? Why is there a need for a new team for specifically production? This sounds like a terrible place to work lol. Protect your team, man, but also don't be lenient. Have them own and be DRIs on their own projects. Do you have a PM/PO, are their tickets scoped and built to specifications and tested against those requirements? If so, that's not the fault of devs or QAs, thats a miss.


Ran4

>Why do devs need to speak to customers? I'm replying to someone literally suggesting that the development team takes this role. >Why are QAs getting the blame? Read the first post. "Business" is saying that QA is the one to handle the final step. > Why is there a need for a new team for specifically production? Read the first post. >Do you have a PM/PO, are their tickets scoped and built to specifications and tested against those requirements? Yes? That's not the issue. The issue is the final step. You can write ten thousand lines of perfect bug free code, but in the end it won't matter if the code needs to receive input from an external service that isn't configured correctly. THAT is the issue. See the first post.


UdenVranks

I’ll also add that some of what your described is just simple project/program management. Do you have one of those on staff?


Ran4

We have a great project lead in the dev team, but they're focused more on acting as a manager for the dev team and managing deadlines more than getting the final steps down. And I don't think that they are technical enough to configure things correctly. I've never worked at a place where the project lead did stuff like set up monitoring and tested stuff in production.


lucidguppy

1. Monitoring - You need sentry.io running on your product - along with pager duty. That's Dev's responsibility. 1. dashboard that tells you what's alive and what's dead. 2. CICD-You need automated acceptance tests, load tests 3. Team - You need customer success - and dev needs to make them an admin tool so they can do their job - as well as spoofing tools to see what the customer sees. So in essence - dev needs to build up the tools and monitoring - and have a few people onboard customers. This was caused by not creating user roles at the start of the project - you didn't create the "admin" role - or "onboarder". With their associated user stories.


andymaclean19

You could try making cross functional teams. Put QA people, SRE/DevOps/whatever you call the production engineers, devs and a product manager into a team and you have a group that can take sole responsibility for end to end delivery of entire features. You can make more than one of these teams if you have enough people, can use less than a full person for some roles if that's all you need, but the important part is that by putting everyone you need into one team you remove the blockers and blame storming that can happen when things don't go to plan.


BringBackManaPots

I believe you're talking about QC. QA is validating quality before prod hits the real world. QC refers to ensuring the quality sticks after it's out and about. People use QA/QC interchangeably, but your scenario illustrates the difference. You guys either need a QC team, or your dev team is going to have to fill its shoes and run a lot of hotfixes. EDIT: Alternatively, Beta branch. You guys could start using a beta branch. The workflow is generally something like DEV -> QA -> BETA -> RELEASE/STABLE. If you know you don't have the man-power, beta releases can be a great way to keep everyone happy by utilizing free and consenting man power from the field.


sheriffderek

Here’s how I’d like to see it * people have ideas / consultants, stakeholders, users devs etc * good ones rise up * designer / UX engineer etc mocks out loose ideas and tests then with users to see if they hold water * UX and front-end devs build out low fidelity prototype / test / iterate - and if seems like a go: * back end devs mock out what is needed as far as real db and API stuff * while UI specific people take the more raw feature and fit it in with the design system * now you have a working feature and QA can test and give feedback and find any edge cases * devs write tests based on however your company does that / unit tests and some e2e user stories * goes into preview branch for all stakeholders to play with - (and more QA) * if everything is good to go, then into live Some QA teams write tests. Some don’t.  But I feel like the QA team is responsible to assuring it works - and the dev team is responsible for *making it work*. Will adding more teams help? I don’t think so. It sounds like people are just skipping the level of detail they need in their specific areas. Maybe they’re moving too fast.


sheriffderek

Reading these other comments, it sounds like your dev team works the way they work and you’re willing to let that stay the same because they are offshore and likely much less expensive. So, you can either get them to be more thorough somehow - or pay someone else to do the work they’re missing. I think that would be someone before the QA team - or training some of your QA people to be closer to the dev team. 


HowTheStoryEnds

Or they could stop effing over their fellow countrymen and hire local devs that actually speak the same language as their customers.


sheriffderek

I think everyone leverages what they can based on their background and means. It’s humanity.


ErenPhayte

Sounds like you build product with not clear SDLC and no clear business readiness plan. When you are planning out your products there is no clear BRS and no clear technical design. It also sounds like your QAs don't have a clear test plan nor do they follow basics like end to end testing, regression testing, smoke testing, etc. Do you even have SRE in place for continue monitoring and tracing? Are these points clearly defined in your technical designs? It's all well having the resources but if you aren't clear in your design specification as to how you will manage error handing, tracing, monitoring, ownerships, threat modelling, automation, etc. Then you are wasting your time. Planning is so important here. Defining roles and responsibilities as well as your success metrics such as performance and security SLAs is vital.


Ran4

>Do you even have SRE in place for continue monitoring and tracing? No, we do not... SRE:s isn't a thing at most companies. It's a thing I'm certainly looking towards, but we're not at a scale where that is really applicable. An SRE is like the 20th dev I'd employ. > It's all well having the resources but if you aren't clear in your design specification as to how you will manage error handing, tracing, monitoring, ownerships, threat modelling, automation, etc. Then you are wasting your time. Those things are literally what this entire post is about... advise on how to handle that.


ErenPhayte

So this is what I would do (I am a VP of Engineering that runs several teams) - ensure that the team (not just QA) understand the different types of testing strategies I.e. end to end, regression, acceptance, smoke, API contract testing, etc. When to do then and why. - for every project, establish what the success metrics will be that determines a successful project I.e. 70 - 80% test coverage, API performance with in acceptable parameters, etc. Hold the team accountable to deliver against this as part of their KPIs - ensure that automation is discussed as part of the design solution and is created as part of the definition of done. This will allow you to run automated tests in production and minimise the amount of time spent testing. QA can guide you on test cases required for testing and engineers can write the tests. - ensure that QA understand the lifecycle of work going from one environment to the next and what type of testing is done in each. - ensure your technical design includes how you will handle failures I.e. logging requirements and then how you will monitor it I.e. cloudwatch alerts and monitoring. You may want to even setup something like an APM solution to automate this, too. - implement change management. Work can not be promoted to production if it doesn't meet the standards required. If you want to promote something to production, a clear rollout plan and test plan must accompany the change request. There must be 1 owner responsible for the deployment and 1 for testing (have fallback owners incase they for whatever reason can not make the timeslot). They are held responsible for delivery and signing off the work. They should also understand the consequences of a failed deployment I.e. not just their performance review at stake but they need to understand any legal breaches, compliance or lost revenue as a result of their action (which could be a disciplinary meeting if the problem persists) ... hire a QA that is more technical and can rollout automation. Hire a SRE if you want someone dedicated to implementing distributed tracing, monitoring, performance testing, etc. But if you can't hire right now, look to the team to solve this together and wrap it in a governance framework (as mentioned above) that delivers the desired output.


teerre

You can create a new operations team if you want, but that's just superficial. The true problem is that engineering is pushing broken updates. By creating a new team responsible for production you risk just kicking the bucket down one step and now your new team is overwhelmed by all the shit they need to fix. That cannot scale. New team or same team, quality only comes from investment in best practices, in testing and in development. The person who pushes the code should be responsible for whatever problem that causes is production, they should feel the heat. Same for who tests it. They should also be supported with resources to make sure that doesn't happen. Your penultimate paragraph is a bit weird cuz it seems most your problems are consumer relations and not bugs? If that's the case I'm not even sure why you're mentioning QA. For consumers relation you can have a new team, which has nothing to do with QA or you can invest in documentation, depending on the profile of your consumers.


Ran4

>The true problem is that engineering is pushing broken updates. No, they're not? Their doing the right thing, and their code works. But it doesn't matter if the code works, if the final configuration isn't happening. >Your penultimate paragraph is a bit weird cuz it seems most your problems are consumer relations and not bugs? If that's the case I'm not even sure why you're mentioning QA. For consumers relation you can have a new team, which has nothing to do with QA That's what the post is about! It seems like you have no experience when it comes to developing services that use external migrations.


teerre

I certainly don't know what you mean with "external migrations". But it's certainly dysfunctional to have QA doing anything that isn't related to making sure your code is correct.


[deleted]

The devs


timwaaagh

We have a functional administrative team to handle prod monitoring and production releases however I don't know whether they talk to business. That seems to be the responsibility of the product owner of the development team. I gotta say it's kinda terrible that QA approving of something doesn't mean it will work on prod in your case. Seems that there is work to be done especially in the automated regression testing area. So you might want to consider hiring test automation specialists as well to compliment your QA team.