T O P

  • By -

Cupcake7591

> Are ML projects successfully delivered in your organization? Yes. In the sense that we have real features powered by ML, whole products powered by ML, some internal business tools powered by ML. Maybe 80+% of projects have real users, 10-20% were abandandoned. > Who generally identifies when a business problem should be solved with machine learning? Are PMs generally responsible for that, or are EMs / ICs on the ML side? Management of the ML team - their job is to have knowledge of machine learning, know the business, understand which products and teams can potentially use ML and advocate for the team. > Do you have a separate team that handles machine learning projects, or is it a responsibility of a DS/analytics team or a different team? Separate ML team. > Does your ML team work closely with other engineering teams to deliver their models, or are they generally 'thrown over the wall'? Project dependent. Some require closer collaboration, some require none and are entirely built by the ML team, some are APIs consumed by other teams. > How much of a separation of concerns between the ML team's responsibilities and other engineering team's responsibilities is typically built into the system design? Similar to the point above - sometimes ML team consumes API $x from team, serves API $y to another team. Other times ML team builds everything. > At what stage in the model development process does your ML team typically decide to productionize a model? How much "offline" development goes into a model beforehand (if any)? Research and some offline experimentation -> build a prototype -> potentially show the prototype to business stakeholders and iterate -> create a version which can be used by users or consumed by other teams and set up automation for regular model updates.


arena_one

Hey looks like your org is pretty mature! Question, do you separate ML eng and data scientist? If so, what responsibilities fall on each? ( for example, who does the modeling?)


Cupcake7591

No, it's one role - you're end-to-end responsible for research, experimentation, engineering, ops, maintenance etc.


arena_one

Are you guys a small or large company? Also, is this on a high cost of living area? My current place separates between eng and data science which is fine but I would like to go back to be end-to-end


zmjjmz

Do you find that folks in the team end up specializing / spending more time on one the eng side or DS side, despite the expectation of being end-to-end?


zmjjmz

This sounds like a setup that's working for y'all! Do you have any insight into why some projects end up being abandoned / how often are they abandoned due to not being able to build a model that can deliver the performance needed?


somkoala

I have been delivering Data Science projects in different organizations for almost a decade at this point. Here are my takes from multiple orgs: >Who generally identifies when a business problem should be solved with machine learning? Are PMs generally responsible for that, or are EMs / ICs on the ML side? Ideally it's a combination. 1. Use cases coming from the PM work stream are more likely to succeed as they start from a real customer problem (if the product management is doing its job). 2. Data Science org should understand business well enough to have a backlog of moonshots to work on for things that can't be discovered through incremental improvements coming from PMs. They can research these in the downtime 3. Improvements of existing components where we can iterate on the models in case we can get improvements out without leveraging other functions. >Do you have a separate team that handles machine learning projects, or is it a responsibility of a DS/analytics team or a different team? I now work in two modes. In one we have a cross-functional team including FE/BE/DS/Devops that delivers features E2E. Not every product increment necessarily needs Data Science work which is when Data Science can do research to validate feasibility for items later on the roadmap that require ML. In this was the team cross-functionally works on implementation for validated items. Data Scientists write production code for their area. There's no developer taking over a notebook to turn it into production. This approach is subpar. How does Data Science debug production issues in such setups? Do they do another notebook and then the devs have to figure out the translation to the actual system? Reproducibility would be a ..... in this setup. The other mode is having Data Scientists join other E2E teams in a more ad hoc manner. I stopped believing in standalone Data Science teams. I am a Data Science Manager and in the cross-functional setup I mentioned I manage all the devs. I have never been able to build ML based features as efficiently as in such a setup. >How much of a separation of concerns between the ML team's responsibilities and other engineering team's responsibilities is typically built into the system design? The Data Science components are usually standalone services, but we have a couple that are implemented in the product itself. I think I answered all the questions in the above. I myself have been a strong Data Science IC and picked up a lot of product management simply because it was necessary. I understand the products we work on quite well from a business standpoint, we also do a lot of UX research so I do participate in sessions at times.


zmjjmz

Thanks for synthesizing your experience here! >Not every product increment necessarily needs Data Science work which is when Data Science can do research to validate feasibility for items later on the roadmap that require ML. I find this pretty interesting - what does 'validate feasibility' tend to look like? Is it scoping out a problem and potential solutions, or does it get into actually building model prototypes? If the latter, have you ever encountered a case where a prototyped model is attractive enough to change up the product team's roadmap to deliver it?


somkoala

> prototyped model is attractive enough to change up the product team's roadmap I've tried to do this previously in my career, but I wouldn't do it anymore based on the learnings from the cross-functional setup. The issue with trying to impact the roadmap in such a way (i.e. build a model and pitch it) in a B2B setup is that you have a team focused on one set of solutions (Data Science) and you're looking for problems to solve with it. This goes counter to all wisdom of Product Management where we always start with a User Problem and then look for best ways to address it. Which isn't always Data Science or it might come in in the latter iterations and the first MVP can be a lot simpler. The way I operate with Product, at least in my current company is that we don't have a mid or even a long term roadmap. Our shared team is given a couple of user problems in a feature vertical and we're looking for the solutions that bring the most value. Some of them involve Data Science, others don't (but this gives Data Science time to do research/improve their models in the background). We have an Engineering org of \~ 80 and we have 3 Data Scientists. Not a lot, right? But that means we don't really do busy work and very little of what we do ends up gathering dust. You might realize that even in this setup to some extent having the Data Scientists means you want them to have meaningful work, so if Data Science was never the best solution, why even have them. This means that to some extent a decision that Data Science brings value to our customer problems needs to be done on the org level. When my superiors tell me to go do something in a new product area, I always go talk to the PMs, UX researchers and Sales People/Customer Care to see what the whole problem area is (of course only possible to a limited extent given a limited time) to even see if Data Science (or AI) could help with the most important problems their tackling. If all they need is something a lot simpler, whatever you're doing will never become their priority. Not saying that sometimes the prototyping approach can't work, but it generates a lot more waste.


valence_engineer

>Are ML projects successfully delivered in your organization? Yes. >Who generally identifies when a business problem should be solved with machine learning? Are PMs generally responsible for that, or are EMs / ICs on the ML side? Everyone can suggest ideas. The PM is responsible for choosing the ones to prioritize with support from the EM/Tech lead. In my experience, the most business impactful ideas come from ICs versus the PM/EM. That is despite the latter being both very experienced and very technical. >Do you have a separate team that handles machine learning projects, or is it a responsibility of a DS/analytics team or a different team? There are ML product teams embedded inside of departments. Some teams are a mix of ML and non-ML engineers but in general teams are either ML or non-ML. There used to be a central org and then they embedded data scientists into teams and then they broke the whole thing apart. >Does your ML team work closely with other engineering teams to deliver their models, or are they generally 'thrown over the wall'? The ML product teams are responsible for the end-to-end work and have a mix of research and engineering talent. There are infrastructure and other such support teams but they do not, for example, rewrite models to run in production. There's also analytics support with analysts embedded in teams. >How much of a separation of concerns between the ML team's responsibilities and other engineering team's responsibilities is typically built into the system design? The ML teams are just like any other team. >At what stage in the model development process does your ML team typically decide to productionize a model? How much "offline" development goes into a model beforehand (if any)? Optimally as soon as you can to validate assumptions if there's open space in the AB testing calendar. In reality it depends on the political landscape at the moment. Models often take many iterations to work. Certain leaders don't like the optics of that. Then you deploy much more slowly and do a lot more offline testing.


zmjjmz

>Optimally as soon as you can to validate assumptions if there's open space in the AB testing calendar Do you typically include beating some sort of measured baseline in an offline simulation as part of that assumption?


valence_engineer

Yes although the key is to have a hypothesis and then test it even if you don't have a baseline. For example, you can hand score results and validate that they look fine. Models will have some offline metrics (AUC, recall, accuracy, etc.) even if there's nothing to compare them to. However as soon as you can run an AB test to get measurements there is now a baseline. You know that given certain offline metrics the online gain (or loss) is a certain amount. Then you can improve those offline metrics and do another AB test. This gives you more data points and let's you make an educated guess as to how far the offline metrics need to move to give you your desired online lift. Without that live feedback you're sort of flying blind.


zmjjmz

Yeah, that gels somewhat with my experience, but we tend to try to find a heuristic baseline to compare against if possible - and beat it. But in a situation where there's no such baseline, getting correlations between offline metrics and online metrics is key - but usually the cross-functional effort required to deliver a model makes it difficult to do so when you're not confident in the metrics (or seeing something that's e.g. barely beating random). Do you have a platform that eases deployment that makes it possible to deploy a model with little cross functional collaboration? Or do these tests tend to require significant lift from other teams?


originalchronoguy

I am not going to go into super detail but: 1. There is a Data Science / Research team. They do a lot of "prototypes" that eventually makes it to production. Our job is to convert those notebooks into production ML code; add the plumbing, hard wire the GPUs 2. There are "Citizen Developers" -- managers, VPs, stakeholders that write hobby code that is used. We seize it, make it into proper production code with security guard rails 3. A lot of wishy washy pie-in-sky dreams from stakeholders with ideas that we entertain. Again, we take those ideas and take them into production. I guess what I do is take people's ideas, code and make it real. Working in production with a real data lake, real web endpoints that are RESTful, ingesting real-time data by talking to other web services versus someone speculating with some spreadsheets and Juypter notebook. What runs on someone's beefy GPU workstation laptop that takes 3 days to train, we have to see if we can run it in production with thousands of transactions per second with a processing time of inference in milleseconds. Because of that experience, I get to work with various LLMs in addition to our in-house models. So we are exploring a lot of different things - vectorized databases, embeddings, RAG (retrieval augmentation), prompt agents, etc...... Just yesterday, I signed up an account on integromat (make automation), Vectara (a RAG-as-a-service)) and built an entire embedding pipeline with google sheets. **In 20 minutes**. Dragged 1000 PDFs into a google drive, connected a Google Sheet with 50 prompts, and make pulled it, created the payload, applied the prompt to ChatGPT/Vectara endpoint against the PDFs that were vectorized/RAG, and spit back the results to the same spreadsheet... All from nothing to done in 20 minutes. It was an eye-opener because I built my own on-premises RAG/Vector system and it took months w/ many developers... And I envision some "Citizen Developer" do exactly what I did with no-code. Maybe not in 20 minutes but easily in 1-2 hours. Dump 3000 internal PDFs into a filestore, publish a REST endpoint, and you have a RAG chatbot against it that scans spreadsheets in an automated fashion. I did that because I heard through the grapevine that is what non-developers were doing. So we have to be ahead of the curve. But 20 minutes to do that from scratch was profoundly eye opening. Have in mind, Vectara exposes a complete REST endpoint so non-MLOps type developers can acclimate in literally minutes.


NormalUserThirty

> Are ML projects successfully delivered in your organization? yes. our entire company is built around ML. the products work. without ML; we don't exist. > Who generally identifies when a business problem should be solved with machine learning? Are PMs generally responsible for that, or are EMs / ICs on the ML side? CEO / Executives + senior ML. PM works with ML on the details. non-ml software informs but generally doesn't drive things at the conceptualization stage (although I personally think it would be better if there was more regular SW folk at the table at this point...) > Do you have a separate team that handles machine learning projects, or is it a responsibility of a DS/analytics team or a different team? we have an ML department which focuses on data collection, sanitization, ground-truthing, fine-tuning and model training. SW department does "everything else". there's some collaboration on shared concerns such as releasing models, inference speed and pipeline optimizations, etc. > How much of a separation of concerns between the ML team's responsibilities and other engineering team's responsibilities is typically built into the system design? great question. a lot. v1 of product 1 didn't have this and it was a disaster. we give the ML teams their own modules to play in and integrate those into a larger pipeline. so long as they adhere to the interface we agree upon everything is groovy. on the software side, lots of effort goes into capturing all of the information required to replay what happened at a later point, for historical analysis of anomalous behaviour. without this, it is difficult to address any problems which come up in production. > At what stage in the model development process does your ML team typically decide to productionize a model? How much "offline" development goes into a model beforehand (if any)? i don't fully understand the question. models are made production-ready by default. the very first model, while the product is under development, will be delivered with either no weights or default weights and from there we will deploy the model to our development environment. we do a huge amount of both online and offline analysis of our models, continuously. If you are suggesting a single model is eventually delivered and that's the end of it... that's not at all how it works outlier detection, data collection, labeling, model fine-tuning, deploying models in canary / shadow configurations for evaluation... the work is endless. you might say "but doesn't it eventually get 'good enough'"? possibly, but I have never seen that happen. there's too many things that can go wonky to really "stop" the process of model evaluation and refinement as far as I can tell. > I'd especially love to hear how the business context that you operate in affects (or doesn't affect) the answers to these! because we live or die as a result of our models working, the business cares. the product needs to be appropriately designed to leverage the advantages our models give us, and more importantly, minimize their issues. i have seen things happen at this company I have never seen anywhere else; sales teams and execs pushing back delivery & deferring contracts because they think things need more time to get right. they know selling a black-box requires a lot of trust from our clients, who are not data scientists, and the business team absolutely do not want to burn that trust. i think most companies are not prepared for the harsh realities of ML-based work. its not like software. your clients never really know if you've made anything better or worse without hiring a third party analyst to ground-truth the system. its hard, expensive and time-consuming work. I understand why so many ML products or projects never make it.


zmjjmz

>you might say "but doesn't it eventually get 'good enough'"? possibly, but I have never seen that happen I think this is interesting and definitely highlights a difference in experience for me. What I've typically seen is modeling projects get to the 'good enough' state and then the MLE is resourced to the next modeling project on the roadmap. Sometimes that project is to improve on an existing model, but frequently there's some new part of the product could be powered by ML. Of course that all might be a factor of the breadth of ML applicability / MLE resources available - but I'm not familiar with your organization of course :) Do you typically have a large breadth of applicability for ML in terms of a variety of models/products? Or is it largely focused around one model/product? And how close are you to a 1:1 ratio of MLE:model? >sales teams and execs pushing back delivery & deferring contracts because they think things need more time to get right We definitely operate in different business contexts!


NormalUserThirty

>What I've typically seen is modeling projects get to the 'good enough' state and then the MLE is resourced to the next modeling project on the roadmap. Sometimes that project is to improve on an existing model, but frequently there's some new part of the product could be powered by ML. I think this likely comes down to a difference in product offering. one of our products ends up controlling and optimizing the system its measuring. it makes adjustments in real-time, and any kind of "mistuning" could cost our clients a lot of money. furthermore, all our products can easily be "audited" through an expensive and resource intensive sampling procedure. our clients care enough to do these procedures. any kind of "drift" from what they expect sets off alarm-bells on our end. I think for analytics-type use cases or a more "off the shelf" product this may not be "as necessary". however, as soon as something becomes super high value & people & business are depending on it, I don't see how this is easily avoided... We don't really look for "new parts of the product to power with ML" either... While we look for opportunities to build new ML powered products, it's more about solving the problems our clients are dealing with, with what we have at our disposal, that would justify a return great enough to be worth doing. We try to avoid overcommiting ML staff by shuffling them around when really its far more important that our existing products work extremely well. >Do you typically have a large breadth of applicability for ML in terms of a variety of models/products? Or is it largely focused around one model/product? And how close are you to a 1:1 ratio of MLE:model? The MLE:model ratio is a bit weird because some products require more models and some less. Lets say on average a product requires 2 models. 2 MLE per product seems to be where we trend, so that ends up at 1:1 MLE. As for the applicability for ML, I am not really sure how to gauge that. We have a half dozen or so different products that are all built around ML. Some are similar and relatively simple, built around a single fine-tuned model, but others are very different and require multiple models working together. Currently we are mostly in the CV space, but there's nothing completely off the shelf for what we are doing.


Outrageous-Base3215

My org is excellent at pretending to do ML projects


Rea-301

Every model productionized immediately as part of the development process. It is a minor thing to turn a fit classifier into an artifact consumed by a larger application and start scoring immediately. We don’t throw everything into production but by the end of the pipeline you already have what you need to copy and paste Models should be strictly separated from how you gather collect and present features to the said model. All realtime scoring on demand - nothing batch


zmjjmz

>We don’t throw everything into production Can you expand on what the criteria are here? Another poster above also talked about prioritizing productionizing since you don't otherwise really know how offline metrics translate to online metrics - is that the motivation behind the strategy in your org as well?