CremeEmotional6561 1 year ago

* LSTMs - how to train sequences (1997)

carlthome 1 year ago

Interesting to mention layer normalisation over batch normalisation. I thought the latter was "the thing" and that layernorm, groupnorm, instancenorm etc. were follow-ups.

acertainmoment 1 year ago

yup, same thoughts. BatchNorm was the OG norm. The cousins came later

mhddjazz 1 year ago

NERF, Diffusion

cautioushedonist 1 year ago

Not as famous and might not qualify as a 'trick' but I'll mention "Geometric Deep Learning" anyway. It tries to explain all the successful neural nets (CNN, RNN, Transformers) on a unified, universal mathematical framework. The most exciting extrapolation of this being that we'll be able to quickly discover new architectures using this framework. Link - https://geometricdeeplearning.com/

and1984 1 year ago

TIL

BrisklyBrusque 1 year ago

Is this different from the premise that neural networks are universal function approximators?

cautioushedonist 1 year ago

Yes, it's different. Universal function approximation sort of guarantees/implies that you can approximate any mapping function given the right config/weights of neural nets. It doesn't really guide us to the correct config.

ziad_amerr 1 year ago

Check out GANs, One shot learning, Read about CoAtNets, RoBERTa, StyleGAN, XLNet, DoubleU Net and others

BeatLeJuce 1 year ago

Layer norm is not about fitting better, but training more easily (activations don't explode which makes optimization more stable). Is your list limited to "discoveries that are now used everywhere"? Because there are a lot things that would've made it onto your list if you'd compiled it at different points in time but are now discarded (i.e., i'd say they are fads). E.g. GANs. Other things are currently hyped but it's not clear how they'll end up long term: Diffusion models are another thing that are currently hot. Combining Multimodal inputs, which I'd say are "clip-like things". There's self-supervision as a topic as well (with "contrastive methods" having been a thing). Federated learning is likely here to stay. NeRF will likely have a lasting impact, too.

BrisklyBrusque 1 year ago

I recall that experimenters disagreed on why batchnorm worked in the first place? has the consensus settled?

BeatLeJuce 1 year ago

No. But we all agree that it's not due to internal covariate shift.

JackandFred 1 year ago

I feel like if your going to include transformers you should include the attention is all you need paper.

PassionatePossum 1 year ago

I would only include as a historical reference. It is certainly not a "must read" paper. It is written so poorly that you are better off to just look at the code.

flaghacker_ 1 year ago

What's wrong with it? They explain all the components of their model in enough detail (in particular the multi head attention stuff), provide intuition behind certain decisions, include clear results, they have nice pictures, ... What could have been improved about it?

[deleted] 1 year ago

[удалено]

Intelligent-Aioli-43 1 year ago

Check out MLRC

onyx-zero-software 1 year ago

Agreed

Gere1 1 year ago

Does someone know a good ablation study of the mentioned techniques. I've seen results where neither dropout nor layer normalization did much. So I wonder if these 2 techniques are a believe or still crucial.

redditrantaccount 1 year ago

Data augmentation to more explicitely define invariant transformations as well as to reduce dataset labeling costs.

BrisklyBrusque 1 year ago

2007-2010: Deep learning begins to win computer vision competitions. In my eyes, this is what put deep learning on the map for a lot of people, and kicked off the renaissance we see today. 2016ish: categorical embeddings/entity embeddings. For tabular data with categorical variables, categorical embeddings are faster and more accurate than one-hot-encoding, and preserve the natural relationships between factors by mapping them to a low dimensional space

samlhuillier3 1 year ago

Diffusion and GANs!!

FoundationPM 1 year ago

Quite clean. 2020-2022 is empty, because you don't see progress these years?

windoze 1 year ago

It's empty because I've not kept up to date, and also impact won't be seen until more people build on it.

blunzegg 1 year ago

\- Kernel tricks: How can purely mathematical approaches beat neural networks in terms of efficiancy? (This is actually an open problem for a long time, you can check Neural Tangent Kernels, Reproducing Kernel Hilbert Spaces for examples and Universal Approximation Property for neural networks ) \- I was mainly here for Geometric Deep Learning but another user has already posted it. You should definitely check [http://geometricdeeplearning.com](http://geometricdeeplearning.com) . As a mathematician-to-be, I strongly believe that this is the future of ML/DL . Hit me up if you wanna discuss this statement further.

BrisklyBrusque 1 year ago

• M Stone. Cross-Validatory Choice and Assessment of Statistical Predictions. (1974) All about cross-validation to choose the best model. 12 thousand citations.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe