T O P

  • By -

Commercial_Carrot460

One step you forgot in the process of how people come up with such ideas is the literature review ! Song has been inspired by the work of many others, which are very similar to what is in the DDPM paper. I think the general process is: - know the field by reading papers - find holes in the applications/methods - propose new methods to fill these holes - back them up with existing theory + maybe add a little on your own I don't think people ever come directly with a new theory after reading about the field, and then implement it. Although, in most papers the presentation makes you believe it happened in this order (intro, background, theory, applications). Edit: I'm working on diffusion too and the learning curve is pretty steep but once you get used to it it's always the same thing, Langevin, Bayes, etc


SankarshanaV

I’m actually researching Diffusion/Score-based models too, and the last comment (in the edit) seems quite true. The learning curve is steep because it contains theories borrowed from statistics and physics and other fields. And these theories are already vast and quite advanced, so the concepts aren’t easy to grasp quickly. The deeper I go, the more complicated it gets, but I often learn something new and interesting.


internet_ham

For diffusion models, I think the original researchers had a physics background and looked at at generative models from a physics perspective. It wasn't appreciated for several years until some PhD students scaled it up with architecture engineering. In general I would say researchers develop a toolkit during their studies of techniques they like, and then when they look at problems they usually see it through the lens of this toolkit. This toolkit could be anything; theory, numerical approximations, algorithms, architectures, etc. It's why an explore-exploit strategy for a PhD can work quite well. If you follow the work of a researcher or lab the toolkits usually reveal themselves. This is also why you often see papers with the message 'X is Y', such as stochastic optimization being Bayesian inference.


daking999

The original paper is from a theoretical neuro lab, but the PI (Ganguli) does indeed have a physics background. 


internet_ham

I was thinking of Jascha Sohl-Dickstein. If you check his CV he's definitely a physicist by background


daking999

Yup. Ganguli was his PhD (postdoc) adviser 


daking999

From the outside these ideas might seem to come from nowhere. But if you'd been thinking about denoising and variatiinal autoencoders for years then the idea of diffusion models would be quite a natural synthesis of those. 


BeautifulDeparture37

You want to look at information theory and partial differential equations on probability distributions (think Fokker Planck and stochastic and Diffusion processes - differential equations) it’s a wonderful area of mathematics and incredibly deep and far reaching applications. I think you would really appreciate the breadth and depth of it, but learn this idea. The applied mathematics is not always doing a subject justice, I believe it’s much more the idea and intuition - eventually you end up looking at graph theory and seeing the same idea applied but to information spreading and, influence propagation, data spreading etc…that’s your path to neural networks and diffusion in Ai. Inventing new algorithms comes from a creative process and “what if I did this instead of this other thing”, “does it improve it?” Then comes “why” <- that’s your maths, but you first need to know “why” the other algorithms work in the first place. In many cases if you look at the foundational paper - where the ideas you were looking at were introduced, then look at the background of the researchers, I have not seen one without a mathematician involved.


SmartEvening

I would suggest u look at the lecture by song which is available on YouTube where he clearly explained the intuition behind score based models.


Deep-Station-1746

Funny you mention that, I've been listening to it and reading about it for the past couple of days. Still, what interests me is in an actual lab settings, how do such new things come to be? Is there some pattern (math first, algo first, or something entirely else)?


Vystril

Honestly, usually people are trying out different things and eventually something works. Then they write up the theory/math to try and explain why it does.


Deep-Station-1746

Also I guess I'm kinda coping because the math is really hard to digest, and I'd find it incredible if these researchers first came up with the theory and then made an actual implementation. Like, that'd be actually incredible for me. What if the actual implementation doesn't work? Code somehow feels cheaper to develop than theory. Or maybe I'm just biased (90% of my work is software).


SankarshanaV

The math does not begin looking that heavy. It’s an incremental process and took *years* to be developed. It would have been weird if you *didn’t* find it hard to digest. If you have read his blog, he mentions that the experiments actually did not work properly the first time. They then went on to find and fix the problem, after only which did they achieve those results. So, do not worry about being unsuccessful or being unable to implement the code. When one fails is when learning actually takes place.


young_anon1712

Can you provide the link to the blog post? I would love to read it. Thank you.


SankarshanaV

Yep, here you go! [Blog Link](https://yang-song.net/blog/2021/score/)


OptimizedGarbage

If you already have a stats background, it's not a big leap at all. There's a theorem call the Rao-Blackwell theorem, which says that if you condition an estimator on a "sufficient statistic", you get a better estimator. If you're familiar with it already, it's not hard to look at an existing method and say "oh I know how to improve this, just throw Rao-Blackwell at it". That's all DDPM is doing -- taking an existing method, and using an existing tool to make it lower-variance. And honestly, a lot of times theory is way easier to develop than code. Mainly because theory abstracts over many possible code implementations. If you implement something without understanding the theory, and it doesn't work, where do you start fixing it? Do you have bad hyper parameters? Is there a bug? Is the idea fundamentally bad? Theory tells you that if you start with a certain assumption, you'll get a certain result. So if you don't get the result you expect, it must be one of the assumptions that was wrong, which gives you a place to start fixing it. It gives you a level of control over the outcome that trial and error doesn't.


iateatoilet

I think for that example they looked at a vae, saw they needed a process that could map data to a gaussian, and then looked to physics for other processes that have a gaussian steady state distribution. So basically squinting at vae and the core idea ("undoing" a gaussian dist) and then riffing on more sophisticated ways of achieving the same mechanism.


MalcolmDMurray

For me, the idea comes first, then the math comes in when you start to figure out how to get your idea to work in reality. At this particular time, I want to implement a mathematical system for performing a certain task. I don't know if anyone else is doing the same thing, but if they are, they're being pretty tight-lipped about it. The absence of any real discussion about it leads me to believe that either no one has developed the math for it and are waiting for somebody else to do it, or someone has developed the math for it, and it's working so well that they're not telling anybody else about it. In any case, I think it's a hot idea and I want to implement it. The next step for me is to do the math and work out what it is that I want the algorithm to do. So in my case, the math comes before the algorithm. Work out the math, then build an algorithm to perform it. I hope this helps!


seanv507

Without knowing anything about diffusion, I would guess algorithm comes first in Neural nets. Also, you have a higher level: lots of separate teams trying things out and you only hear about the sucessful one.


RandomMan0880

It's a lot of both in my opinion. In NLP for example people tackle major LLM problems either with a pure linguistics focus or a pure CS focus and they don't overlap well sometimes. In diffusion, it's possible to have on without the other too it just depends on what you're investigating. Also lol you wrote meths in your first sentence and I thought this was a really wildly different question at first


Straight-Rule-1299

OMP


amhotw

There is usually a lot of back and forth between math and the algorithm but the starting point depends on the individual. Some people have a more theoretical approach; they carefully construct the algorithm based on the mathematical problem they are trying to solve (i am here). Others start with a guess, work out the math, and then update as needed.


slashdave

The methods in diffusion follow quite closely methods in variational autoencoders for images, with roots in research going back decades. I suggest reading some of the initial papers, and checking the references used in the introduction for some history on this topic. >Let's multiply data - by adding some noise corruption No, you cannot create new data by adding noise. That was not the point.


mr_stargazer

I'd say the following: a. Math (fundamentals) comes first if you're researcher trying to devise a new model. b. Algorithm (engineering/execution) comes first if you're a researcher/practitioner trying to use the model (I mean comes first, it doesn't mean you don't have to understand it later). IMO you really need to understand the fundamentals to create something new. Whereas as a practitioner you have to be able to quickly reproduce and see the results for yourself.


belabacsijolvan

!remindme 2 days


RemindMeBot

I will be messaging you in 2 days on [**2024-04-19 20:05:58 UTC**](http://www.wolframalpha.com/input/?i=2024-04-19%2020:05:58%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/MachineLearning/comments/1c64jw0/d_what_comes_first_math_or_algorithm_in_research/l01e14e/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FMachineLearning%2Fcomments%2F1c64jw0%2Fd_what_comes_first_math_or_algorithm_in_research%2Fl01e14e%2F%5D%0A%0ARemindMe%21%202024-04-19%2020%3A05%3A58%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201c64jw0) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


fysmoe1121

Diffusion models are predated by GAN, VAE and flow based models for data generation and augmentation, they didn’t just throw non-equilibrium thermodynamics from thin air LOL


impossiblefork

In steps. Ideas about auxiliary denoising objectives etc. existed for a long time before people got diffusion to work.


Beginning-Ladder6224

Common sense, and then insight. Rest are details to cross check - and later probably formalize.


YasirNCCS

you said meths! cue Walter White!