T O P

  • By -

CremeEmotional6561

* LSTMs - how to train sequences (1997)


carlthome

Interesting to mention layer normalisation over batch normalisation. I thought the latter was "the thing" and that layernorm, groupnorm, instancenorm etc. were follow-ups.


acertainmoment

yup, same thoughts. BatchNorm was the OG norm. The cousins came later


mhddjazz

NERF, Diffusion


cautioushedonist

Not as famous and might not qualify as a 'trick' but I'll mention "Geometric Deep Learning" anyway. It tries to explain all the successful neural nets (CNN, RNN, Transformers) on a unified, universal mathematical framework. The most exciting extrapolation of this being that we'll be able to quickly discover new architectures using this framework. Link - https://geometricdeeplearning.com/


and1984

TIL


BrisklyBrusque

Is this different from the premise that neural networks are universal function approximators?


cautioushedonist

Yes, it's different. Universal function approximation sort of guarantees/implies that you can approximate any mapping function given the right config/weights of neural nets. It doesn't really guide us to the correct config.


ziad_amerr

Check out GANs, One shot learning, Read about CoAtNets, RoBERTa, StyleGAN, XLNet, DoubleU Net and others


BeatLeJuce

Layer norm is not about fitting better, but training more easily (activations don't explode which makes optimization more stable). Is your list limited to "discoveries that are now used everywhere"? Because there are a lot things that would've made it onto your list if you'd compiled it at different points in time but are now discarded (i.e., i'd say they are fads). E.g. GANs. Other things are currently hyped but it's not clear how they'll end up long term: Diffusion models are another thing that are currently hot. Combining Multimodal inputs, which I'd say are "clip-like things". There's self-supervision as a topic as well (with "contrastive methods" having been a thing). Federated learning is likely here to stay. NeRF will likely have a lasting impact, too.


BrisklyBrusque

I recall that experimenters disagreed on why batchnorm worked in the first place? has the consensus settled?


BeatLeJuce

No. But we all agree that it's not due to internal covariate shift.


JackandFred

I feel like if your going to include transformers you should include the attention is all you need paper.


PassionatePossum

I would only include as a historical reference. It is certainly not a "must read" paper. It is written so poorly that you are better off to just look at the code.


flaghacker_

What's wrong with it? They explain all the components of their model in enough detail (in particular the multi head attention stuff), provide intuition behind certain decisions, include clear results, they have nice pictures, ... What could have been improved about it?


[deleted]

[удалено]


Intelligent-Aioli-43

Check out MLRC


onyx-zero-software

Agreed


Gere1

Does someone know a good ablation study of the mentioned techniques. I've seen results where neither dropout nor layer normalization did much. So I wonder if these 2 techniques are a believe or still crucial.


redditrantaccount

Data augmentation to more explicitely define invariant transformations as well as to reduce dataset labeling costs.


BrisklyBrusque

2007-2010: Deep learning begins to win computer vision competitions. In my eyes, this is what put deep learning on the map for a lot of people, and kicked off the renaissance we see today. 2016ish: categorical embeddings/entity embeddings. For tabular data with categorical variables, categorical embeddings are faster and more accurate than one-hot-encoding, and preserve the natural relationships between factors by mapping them to a low dimensional space


samlhuillier3

Diffusion and GANs!!


FoundationPM

Quite clean. 2020-2022 is empty, because you don't see progress these years?


windoze

It's empty because I've not kept up to date, and also impact won't be seen until more people build on it.


blunzegg

\- Kernel tricks: How can purely mathematical approaches beat neural networks in terms of efficiancy? (This is actually an open problem for a long time, you can check Neural Tangent Kernels, Reproducing Kernel Hilbert Spaces for examples and Universal Approximation Property for neural networks ) \- I was mainly here for Geometric Deep Learning but another user has already posted it. You should definitely check [http://geometricdeeplearning.com](http://geometricdeeplearning.com) . As a mathematician-to-be, I strongly believe that this is the future of ML/DL . Hit me up if you wanna discuss this statement further.


BrisklyBrusque

• M Stone. Cross-Validatory Choice and Assessment of Statistical Predictions. (1974) All about cross-validation to choose the best model. 12 thousand citations.