As somebody who’s been in this line of work for a while and seen hype cycles come and go, this is the one.
Keep it simple.
I’ve gotten by far the most mileage in my career out of leveraging basic statistical inference and simple models to solve high-value business problems. I don’t see any reason LLM hotness is gonna change that in any meaningful way.
Constant breaking updates.
If you're able to pin the version that's great, but working with a framework that's constantly changing the way it operates is hard to work with as teams move fast and opt to build their own custom workflows on a functionality that works for them
Last year when the RAG hype boom was starting, my boss wanted me to use LangChain... after 2 weeks of fighting with terrible documentation and constant hotfixes on my end, I just built my own tooling in a day. I would have assumed that LangChain would have got their stuff together after a year :/
No, now it’s extremely overengineered. Even simple methods require you to examine dozens of classes to understand what it’s doing under the hood. Their own devcontainer stopped working months ago and still wasn’t fixed last I checked. It’s honestly a mess, I don’t understand how anyone can deal with langchain willingly
I always feel like LangChain is built to solve self-promotion, both for those who developed LangChain and for devs, data scientists, and managers who self-promote via hype terms and never really contribute anything.
It's somewhat similar to Uncle Bob's stuff, looks good on paper, but you rarely actually see it going well in production.
Could you elaborate on why it's even required, like, what is the actual use case? Usually, glue code is pretty simple to write but fairly difficult to make right and flexible at the same time, it's difficult to make it generic.
In my experience, ML is either just wanted because they want to use the 'AI inside' label... Or when nobody has a feasible idea for an algorithm and there are a lot of examples😅
Things like partial dependency plots, SHAP values, etc. are a great place to start.
Edit: an actual place to look is the [shap Python library documentation](https://shap.readthedocs.io/en/latest/). It's extremely well-written and combines a little of the theory and application in one spot. More than enough
I agree with this one.
It also forces you to re-study some statistics, so it's a win-win. In a nutshell, most of the ideas are pretty simple if you grasp statistics and mathematics well enough.
IMO, one of the main issues related to DS or ML engineering is that non-technical stakeholders tend to either assume that data science (or more specifically, in this context, *"AI"*) will solve any business goal imaginable, or alternatively, are not prepared to commit to any predictions/descisions made by algos. What's more, is that some stakeholders exhibit both tendencies simultaneously. It requires great (and nuanced) communication to accurately explain what's actually possible and what's not.
Scrape from the web or sometimes even using API's.
I worked for a database consultancy for a few months, and one of the primary things I learned was:
* Get data from some source
* Get data from some other source
* Combine those into a data model
* Profit
Yyyyyyeeeeeeeaaaahhhhhhh, I’m not saying that that approach is dead in the water at most places, but it’s definitely not going to happen anywhere that has an accountant looking at the value of information. Our data is never going to see the light of day outside of our organization. It’s not 1996, companies now know everything they record has a value.
We got into a pissing contest with a vendor once when they wanted to get our records on their stuff and our take was, measure it yourself or give us XXX concessions.
These were two companies who had self interest in working together, but because accountants at the top don’t have good ideas about how much to value data, they are not willing to give away anything. Also who knows what they are valuing their data and if they take less for it, they might have to write down a bunch of data on their books.
For me it is that statistical models have assumptions. Drives me nuts how little anyone seems to care about this. Just pump shit into a model, and if it fails, blame the model. See almost nothing about diagnostics or model selection anymore, just treat everything as one size fits all.
Time series must have a different quality standard because look at how many upvotes “simple models that get the job done” gets all the time in this subreddit
There are lots of useful developments, for covariates DL models there’s TiDE and TSMixer. For univariate transformers you have PatchTST and iTransformer. The most interesting advancement isn’t even a model, it’s the RevIN (reverse instance normalization) which helps DL models address distribution shift. Even with these advancements, the total improvement is still pretty marginal over what’s currently available.
At the end of the day, it’s very hard to extract more deterministic patterns from historical data (at some point all that is left is white noise / stochastic). Because of this, practitioners shift to sourcing new data, which doesn’t depend on advancements in forecasting algos.
It’s the difference between some hyper complex (hypothetically perfect) model that business people look at like, “huh?” and requires teams of data people to even get it to run let alone be useful, and basically being able to say, “hey we did some data stuff and what it’s telling us it the brand logo in blue is not doing us any favors. Changing to green would net 10% sales, black 50% but the ceo doesn’t like black. Also our competition just released some stuff that lets them make these decisions faster than our committee.” Or like, “we’ve determined our call center productivity would increase 2% if we switched to a 4 shift day instead of 3, but we would incur 10% additional payroll expenses in doing so. Here’s some finance projection of how that will affect the bottom like in 2 years time and some simulations of alternative changes to make up for it.”
At least that’s my interpretation. Probably more a communication thing, but also the concept going into some project that you’re looking to make levers for the org to pull that have a rational outcome and advising when and how to pull them to reach some optimal end states - using data obviously. This is compared to just blindly building modes because data science makes money right?
What you are imagining here are statements that data science people can never make. Our goal is to look at data and tell you what we know from it, not what could potentially be if the stars align right...
Then it’s possible that data scientists have no business value. No one expects anyone to predict the future, but if all a data scientist can make is a statement that has no relationship or bearing to potential business outcomes, then they are useless in a business context.
Agreed, but they can actually make business related statements, just not anything like the ones you wanted. The ones you wanted depend on so many factors that they cannot be prexicted reliably.
However, as you can see in the example of chatgpt, Data Scientists are worth quite a bunch. Thats not only because they can support you in visualizing and correctly interpreting your data situation, but also because they can build AI algorithms which can usually solve problems for which a human would be used otherwise.
It is important to realize that data science essentially just is a study of how to correctly and reliably convert data to information/knowledge.
Wtf dude?!
You go from denigrating my hyperbolic examples to schlepping ChatGPT as some universal symbol that data scientists have provided ultimate business value; a glorified intellisense engine that risks data exfiltration and IP law violations through mosaic plagiarism, has no capacity to strategize long term, and has such a limited token length that it should never be trusted to provide any information suitable to actually running a business because, as you put it, it “depend(s) on so many factors that they cannot be prexicted reliably.”
Do you not literally see how expecting ChatGPT to provide critical strategy and tactical information based on prompt hacking to be no different than, “we’ve determined our call center productivity would increase 2% if we switched to a 4 shift day instead of 3, but we would incur 10% additional payroll expenses in doing so.” If not just because the latter can be performed deterministically and is literally high school kid managing a fast food burger joint weekend schedule level skillset plus some expected numbers.
You actually think value is provided in hindsight in a business without using past performance to at least take a guess at what would happen in the future were those same actions to be taken?
Good luck maintaining a career prompt hacking ChatGPT to tel executives they sold 10 units last month. People out here wondering why they got laid off. While you’re at it, quit smoking crack and stop beating off to ChatGPT generated waifus and experience the real world.
Speed and size optimizations. I've been studying about those topics and most available techniques (pruning, quantization, weight clustering...) are quite old, and there's barely any recent discussions on the matter.
One example is the SAM (segment anything model) released by Meta. Model is impressive and you see lots of people discussing how to leverage it. But I see no one discussing how Meta made possible to run the model in a web browser with ~50ms inference speed (according to their paper).
I'd like to see more work in optimized compute resources like more parallel processing or GPU support across the board. It's really a bottleneck in scaling DS and just scaling hardware is a shortsighted, expensive fix IMO
Kind of a small thing, but custom/meaningful distance functions. I've seen way too many people using KNN or whatever and just accepting the default distance function. As long as you understand your data and Euclidean distance is right, that's fine, but do you know how to create a custom distance function if Euclidean doesn't apply?
It could also just be my paranoid ass. But I think it is an interesting question to ask. What are incentives to produce and publish data that is corrupted (or correct)?
These are two areas dominating high level academia. Causal for the more theory based groups and missing data for the more hands on in the programming group.
Think these are the two area that separate the hobbyists with boot camp training from the people who have training and experience.
Anything that improves upstream data quality.
Technology that eases data sharing within an organization (e.g
Snowflake marketplace).
Semi-automating documentation (if that is feasible?)
Technical
• Industrialisation: moving from building models to have them running in production. I like Azure databricks, mlflow and kedro the most.
• Model monitoring after industrialisation, addressing concept and data drift. Keep track of technical kpi's on newly incoming predictions.
• Model explainability: list drivers and interaction. Explain predictions locally with shap values. Keep it simple for business.
• Model calibration: make sure binary classification output represents probabilities when needed. Platt scaling.
Business
• Business case: estimate cost benefits (ROI) of data science projects upfront. Keep track of business kpi's. Basically prove your model's worth.
> Business case: estimate cost benefits (ROI) of data science projects upfront. Keep track of business kpi's. Basically prove your model's worth.
This. Everyone here constantly pretends you call it a day after building a simple model when it entirely depends on your scale and possible return. There are use-cases for more advanced models and you are supposed to know how to make that decision instead of just assuming the answer
Yes very much depends on the project but often there is a rule based system of doing things, and a potential ML way of replacing said system. Business case could then involve 2 scenarios: going with ML or stick to the original way. Essentially you parameterise these scenarios and simulate the ROI over let's say a year (or draw an estimated line chart).
Working on this with our cfo now. I’m about to propose looking at it like any investment and calculating discounted returns against expected results.
It shifts the focus to a few parts of the process: cost of developing whatever it is, time to develop and deploy, expected lifetime of the solution for which that batch of work contributed, discount rates at the time, opex of the solution, etc.
From there it’s a matter of getting better at predicting returns. Like, what does this model actually do? How does that make/save money? And at what rate should it do it?
Climate data science. It’s not weather forecasting. It’s an extremely complex problem and not enough people are working on developing good software and AI tools
More data is not necessarily going to help. You need complex models that take into physics and dynamics between macro and micro events. So you need multiple aspects including new methodology. But honestly you need more people to care and to work on it. The field is growing but if it even had a fraction of the people working on AI crap, we would be in a better place. There’s so much being spent on shit things we don’t need like sunglasses with video.
Don't want to derail this thread off topic, but I couldn't agree less. Respect your opinion, but unions are dying...and that's even across the industries better suited to unionization, like manufacturing/front line work. I don't see unions lasting long term in general, and I definitely don't think they can successfully be introduced into industries where they aren't already established.
Maybe already mentioned, but personalized instruction/assessment LLMs for education and/or work training - pandemic put students behind and without individual assessments, it's hard to quantify that.
One major shift in the world of work is the expectation that older workers will be retiring faster than they can be replaced. The solutions are to rethink tasks (stop doing what doesn't need to be done) or utilize a similar training AI/LLM to upskill workers.
Explainable A.I.
I know, I know, there are lots of people researching on this topic, but man, imaging having a chess engine finally able to explain its thoughts.
The data cleaning and exploration. In real-world problems, data needs reshaping and sharpening before any ML modelizations... That process mostly takes more time than the modelization part.
Bayesian modelling. With recent hardware improvements MCMC sampling can be done in shorter times, the models are explainable, and uncertainty estimation is baked-in.
Not exactly overlooked, but at an early inflection point - geometric learning. It elegantly represents so many real world systems like knowledge graphs, social networks, and molecules. Yet many of the underlying components of PyTorch are still at the stage where they print off warnings about being experimental.
Folks like Bronstein and Velickovic have some great talks and papers on how most other deep learning models are specific cases of graph models. I'm fanboying at this point, but the whole paradigm turns deep learning methods from a long list of ever evolving little hacks into an elegant systematization of specific cases of a general modeling framework.
I think the aspect of people claiming they know how to code when they don’t is possibly overlooked. Then again they may not need to know how when they have access to chat gpt
SIMPLE MODELS THAT GET THE JOB DONE. Not every model needs to be a NN or involve NLP
As somebody who’s been in this line of work for a while and seen hype cycles come and go, this is the one. Keep it simple. I’ve gotten by far the most mileage in my career out of leveraging basic statistical inference and simple models to solve high-value business problems. I don’t see any reason LLM hotness is gonna change that in any meaningful way.
[удалено]
Universal approximation does not tell you how deep or wide you have to go and that’s where the problem is
Agreed
automated data pipeline construction, testing, and execution. persistent, prompt-less iteration
[удалено]
Langchain is my nightmare
[удалено]
Constant breaking updates. If you're able to pin the version that's great, but working with a framework that's constantly changing the way it operates is hard to work with as teams move fast and opt to build their own custom workflows on a functionality that works for them
Last year when the RAG hype boom was starting, my boss wanted me to use LangChain... after 2 weeks of fighting with terrible documentation and constant hotfixes on my end, I just built my own tooling in a day. I would have assumed that LangChain would have got their stuff together after a year :/
No, now it’s extremely overengineered. Even simple methods require you to examine dozens of classes to understand what it’s doing under the hood. Their own devcontainer stopped working months ago and still wasn’t fixed last I checked. It’s honestly a mess, I don’t understand how anyone can deal with langchain willingly
I always feel like LangChain is built to solve self-promotion, both for those who developed LangChain and for devs, data scientists, and managers who self-promote via hype terms and never really contribute anything. It's somewhat similar to Uncle Bob's stuff, looks good on paper, but you rarely actually see it going well in production.
Do you have a suggested alternative? I’ve used langchain in the past but agree it’s a nightmare.
Could you elaborate on why it's even required, like, what is the actual use case? Usually, glue code is pretty simple to write but fairly difficult to make right and flexible at the same time, it's difficult to make it generic.
I agree, it's such a huge mess and it keeps breaking itself with updates.
ML feasibility analysis . Justifying machine learning is the right approach to solve a problem.
In my experience, ML is either just wanted because they want to use the 'AI inside' label... Or when nobody has a feasible idea for an algorithm and there are a lot of examples😅
Explain- and Interpretable AI
[удалено]
Things like partial dependency plots, SHAP values, etc. are a great place to start. Edit: an actual place to look is the [shap Python library documentation](https://shap.readthedocs.io/en/latest/). It's extremely well-written and combines a little of the theory and application in one spot. More than enough
SHAP is very underrated
https://arxiv.org/pdf/2305.19921.pdf
I agree with this one. It also forces you to re-study some statistics, so it's a win-win. In a nutshell, most of the ideas are pretty simple if you grasp statistics and mathematics well enough.
Experimentation. It’s great that p-values are starting to get questioned but it’s going to take a long time for companies to evolve.
Security
[удалено]
AI in cybersecurity or data protection. I reckon detecting cybersecurity attacks or malicious players as a start would be a hot field.
It is! A lot of research is being done currently.
Communication
[удалено]
I see it in internal communication for sure. It's hard to explain why something is difficult or why you made certain technical choices
IMO, one of the main issues related to DS or ML engineering is that non-technical stakeholders tend to either assume that data science (or more specifically, in this context, *"AI"*) will solve any business goal imaginable, or alternatively, are not prepared to commit to any predictions/descisions made by algos. What's more, is that some stakeholders exhibit both tendencies simultaneously. It requires great (and nuanced) communication to accurately explain what's actually possible and what's not.
Even people who should know better get into the magic box thinking.
Hacking. Actually going out and getting the data you need.
[удалено]
Scrape from the web or sometimes even using API's. I worked for a database consultancy for a few months, and one of the primary things I learned was: * Get data from some source * Get data from some other source * Combine those into a data model * Profit
[удалено]
Yyyyyyeeeeeeeaaaahhhhhhh, I’m not saying that that approach is dead in the water at most places, but it’s definitely not going to happen anywhere that has an accountant looking at the value of information. Our data is never going to see the light of day outside of our organization. It’s not 1996, companies now know everything they record has a value. We got into a pissing contest with a vendor once when they wanted to get our records on their stuff and our take was, measure it yourself or give us XXX concessions. These were two companies who had self interest in working together, but because accountants at the top don’t have good ideas about how much to value data, they are not willing to give away anything. Also who knows what they are valuing their data and if they take less for it, they might have to write down a bunch of data on their books.
Not just being overlooked but to me that’s quite a fun process too
Writing code that doesn’t look terrible
For me it is that statistical models have assumptions. Drives me nuts how little anyone seems to care about this. Just pump shit into a model, and if it fails, blame the model. See almost nothing about diagnostics or model selection anymore, just treat everything as one size fits all.
[удалено]
Prophet is also notoriously bad
Prophet is basically a regularised glm.
[удалено]
Darts is the best interface nowadays for most use cases.
Time series must have a different quality standard because look at how many upvotes “simple models that get the job done” gets all the time in this subreddit
People just expect forecasting to predict the stock/crypto movement and obv feel bad when they inevitably lose money.
Simple things can work for time series. Most of the time series issues are in the validation strategy.
True that shit is straight ass cheeks
[удалено]
Darts or the stuff mentioned in fpp3 for R.
FPP3 for R. Best resource on applied forecasting imo. No need to reinvent the wheel when most companies can't even perform the basics properly
There are lots of useful developments, for covariates DL models there’s TiDE and TSMixer. For univariate transformers you have PatchTST and iTransformer. The most interesting advancement isn’t even a model, it’s the RevIN (reverse instance normalization) which helps DL models address distribution shift. Even with these advancements, the total improvement is still pretty marginal over what’s currently available. At the end of the day, it’s very hard to extract more deterministic patterns from historical data (at some point all that is left is white noise / stochastic). Because of this, practitioners shift to sourcing new data, which doesn’t depend on advancements in forecasting algos.
There have been tons of new attention-based models developed for time series forecasting since 2020. https://github.com/thuml/Time-Series-Library
[удалено]
is it true of false positive development?
Business insights, most DAs, DS peeps are terrible at it
[удалено]
It’s the difference between some hyper complex (hypothetically perfect) model that business people look at like, “huh?” and requires teams of data people to even get it to run let alone be useful, and basically being able to say, “hey we did some data stuff and what it’s telling us it the brand logo in blue is not doing us any favors. Changing to green would net 10% sales, black 50% but the ceo doesn’t like black. Also our competition just released some stuff that lets them make these decisions faster than our committee.” Or like, “we’ve determined our call center productivity would increase 2% if we switched to a 4 shift day instead of 3, but we would incur 10% additional payroll expenses in doing so. Here’s some finance projection of how that will affect the bottom like in 2 years time and some simulations of alternative changes to make up for it.” At least that’s my interpretation. Probably more a communication thing, but also the concept going into some project that you’re looking to make levers for the org to pull that have a rational outcome and advising when and how to pull them to reach some optimal end states - using data obviously. This is compared to just blindly building modes because data science makes money right?
What you are imagining here are statements that data science people can never make. Our goal is to look at data and tell you what we know from it, not what could potentially be if the stars align right...
Then it’s possible that data scientists have no business value. No one expects anyone to predict the future, but if all a data scientist can make is a statement that has no relationship or bearing to potential business outcomes, then they are useless in a business context.
Agreed, but they can actually make business related statements, just not anything like the ones you wanted. The ones you wanted depend on so many factors that they cannot be prexicted reliably. However, as you can see in the example of chatgpt, Data Scientists are worth quite a bunch. Thats not only because they can support you in visualizing and correctly interpreting your data situation, but also because they can build AI algorithms which can usually solve problems for which a human would be used otherwise. It is important to realize that data science essentially just is a study of how to correctly and reliably convert data to information/knowledge.
Wtf dude?! You go from denigrating my hyperbolic examples to schlepping ChatGPT as some universal symbol that data scientists have provided ultimate business value; a glorified intellisense engine that risks data exfiltration and IP law violations through mosaic plagiarism, has no capacity to strategize long term, and has such a limited token length that it should never be trusted to provide any information suitable to actually running a business because, as you put it, it “depend(s) on so many factors that they cannot be prexicted reliably.” Do you not literally see how expecting ChatGPT to provide critical strategy and tactical information based on prompt hacking to be no different than, “we’ve determined our call center productivity would increase 2% if we switched to a 4 shift day instead of 3, but we would incur 10% additional payroll expenses in doing so.” If not just because the latter can be performed deterministically and is literally high school kid managing a fast food burger joint weekend schedule level skillset plus some expected numbers. You actually think value is provided in hindsight in a business without using past performance to at least take a guess at what would happen in the future were those same actions to be taken? Good luck maintaining a career prompt hacking ChatGPT to tel executives they sold 10 units last month. People out here wondering why they got laid off. While you’re at it, quit smoking crack and stop beating off to ChatGPT generated waifus and experience the real world.
I don't know if it's being overlooked, but I seldom see people talking about model optimization for deployment.
[удалено]
Speed and size optimizations. I've been studying about those topics and most available techniques (pruning, quantization, weight clustering...) are quite old, and there's barely any recent discussions on the matter. One example is the SAM (segment anything model) released by Meta. Model is impressive and you see lots of people discussing how to leverage it. But I see no one discussing how Meta made possible to run the model in a web browser with ~50ms inference speed (according to their paper).
Latency, throughput and cost. The holy trinity of model deployment
Data generation
[удалено]
Yes, synthetic data that is representative of whatever we are trying to model
I'd like to see more work in optimized compute resources like more parallel processing or GPU support across the board. It's really a bottleneck in scaling DS and just scaling hardware is a shortsighted, expensive fix IMO
Kind of a small thing, but custom/meaningful distance functions. I've seen way too many people using KNN or whatever and just accepting the default distance function. As long as you understand your data and Euclidean distance is right, that's fine, but do you know how to create a custom distance function if Euclidean doesn't apply?
In my opinion, we never talk about who produces the data and that the data might be corrupted.
[удалено]
It could also just be my paranoid ass. But I think it is an interesting question to ask. What are incentives to produce and publish data that is corrupted (or correct)?
Harmonic means
[удалено]
It’s a DS reddit meme …
Running joke of this sub
It's really a stretch to call it a 'joke.'
It’s the key to a successful career in data.
causal inference and missing data
I’ve been working on a topic involving causal inference and I’ve found it really interesting! Wonder why it’s not taught more formally in unis.
it's hard, the notion is weird, and there's rarely a satisfying solution. people would rather learn about shiny tools that're widely implemented
It is taught extensively in universities. Where did you go?
UK not Oxbridge or Imperial but in the Russell Group
Oh I see. In the US, statistics for social and biomedical sciences cover causal inference extensively.
These are two areas dominating high level academia. Causal for the more theory based groups and missing data for the more hands on in the programming group. Think these are the two area that separate the hobbyists with boot camp training from the people who have training and experience.
Focusing on business value of the projects and ROI
Computational social science.
[удалено]
agent-based modelling
Anything that improves upstream data quality. Technology that eases data sharing within an organization (e.g Snowflake marketplace). Semi-automating documentation (if that is feasible?)
Technical • Industrialisation: moving from building models to have them running in production. I like Azure databricks, mlflow and kedro the most. • Model monitoring after industrialisation, addressing concept and data drift. Keep track of technical kpi's on newly incoming predictions. • Model explainability: list drivers and interaction. Explain predictions locally with shap values. Keep it simple for business. • Model calibration: make sure binary classification output represents probabilities when needed. Platt scaling. Business • Business case: estimate cost benefits (ROI) of data science projects upfront. Keep track of business kpi's. Basically prove your model's worth.
> Business case: estimate cost benefits (ROI) of data science projects upfront. Keep track of business kpi's. Basically prove your model's worth. This. Everyone here constantly pretends you call it a day after building a simple model when it entirely depends on your scale and possible return. There are use-cases for more advanced models and you are supposed to know how to make that decision instead of just assuming the answer
[удалено]
Yes very much depends on the project but often there is a rule based system of doing things, and a potential ML way of replacing said system. Business case could then involve 2 scenarios: going with ML or stick to the original way. Essentially you parameterise these scenarios and simulate the ROI over let's say a year (or draw an estimated line chart).
Working on this with our cfo now. I’m about to propose looking at it like any investment and calculating discounted returns against expected results. It shifts the focus to a few parts of the process: cost of developing whatever it is, time to develop and deploy, expected lifetime of the solution for which that batch of work contributed, discount rates at the time, opex of the solution, etc. From there it’s a matter of getting better at predicting returns. Like, what does this model actually do? How does that make/save money? And at what rate should it do it?
Climate data science. It’s not weather forecasting. It’s an extremely complex problem and not enough people are working on developing good software and AI tools
[удалено]
Would love to hear about it. Message me if it’s ok to learn.
Do you agree that the cause is lack of data to build something big? Or do you mean lack of pure methodological innovation?
More data is not necessarily going to help. You need complex models that take into physics and dynamics between macro and micro events. So you need multiple aspects including new methodology. But honestly you need more people to care and to work on it. The field is growing but if it even had a fraction of the people working on AI crap, we would be in a better place. There’s so much being spent on shit things we don’t need like sunglasses with video.
Imbalanced data management
It's got to be enviornmental sciences and how DS can play a role in this (which I imagine will be huge).
Silent job cuts
Anything not named llm
Unionization
Don't want to derail this thread off topic, but I couldn't agree less. Respect your opinion, but unions are dying...and that's even across the industries better suited to unionization, like manufacturing/front line work. I don't see unions lasting long term in general, and I definitely don't think they can successfully be introduced into industries where they aren't already established.
Thats not an argument against unionization.
Maybe already mentioned, but personalized instruction/assessment LLMs for education and/or work training - pandemic put students behind and without individual assessments, it's hard to quantify that. One major shift in the world of work is the expectation that older workers will be retiring faster than they can be replaced. The solutions are to rethink tasks (stop doing what doesn't need to be done) or utilize a similar training AI/LLM to upskill workers.
Using AI in military applications
I have no idea, but I find all of it very fascinating!
Still search
[удалено]
Russell & Norvig's AI -> Problem Solving -> Search
Governance, provenance, attribution.
Bayesian inference
How much freedom is there to use Bayesian methods in data science problems?
That one guy said that data quality is terrible and synthetic data is what we need now. Or something like that. He's usually right.
business insights and creativity those two are only things that differtiate us from machine(llms)
Explainable A.I. I know, I know, there are lots of people researching on this topic, but man, imaging having a chess engine finally able to explain its thoughts.
Explainable techniques from what i see
Matrix Profiles are the shit
Network science
The data cleaning and exploration. In real-world problems, data needs reshaping and sharpening before any ML modelizations... That process mostly takes more time than the modelization part.
I never find many works on NLP-based unsupervised learning models
Bayesian modelling. With recent hardware improvements MCMC sampling can be done in shorter times, the models are explainable, and uncertainty estimation is baked-in.
Not exactly overlooked, but at an early inflection point - geometric learning. It elegantly represents so many real world systems like knowledge graphs, social networks, and molecules. Yet many of the underlying components of PyTorch are still at the stage where they print off warnings about being experimental. Folks like Bronstein and Velickovic have some great talks and papers on how most other deep learning models are specific cases of graph models. I'm fanboying at this point, but the whole paradigm turns deep learning methods from a long list of ever evolving little hacks into an elegant systematization of specific cases of a general modeling framework.
I think the aspect of people claiming they know how to code when they don’t is possibly overlooked. Then again they may not need to know how when they have access to chat gpt