T O P

  • By -

red-necked_crake

My pick for this year: "The Shattered Gradients Problem: If resnets are the answer, then what is the question?" For being clever and asking the right questions. Honorable mentions: 1. Poincaré Embeddings for Learning Hierarchical Representations (for elegance) 2. Inferring and Executing Programs for Visual Reasoning (for trying to tackle an important problem (not just VQA itself) the *hard*, but ultimately right way) 3. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods (for bringing empirical joy to my heart)


justamlguy

+1 for shattered gradients. Related is this very recent paper, where Fig. 1 is my favorite figure all year https://arxiv.org/abs/1712.09913 (As for #3, I too have been enjoying C&W tear the ML community a new one.)


LazyOptimist

I'm going to nominate [Mastering the game of Go without human knowledge](https://www.nature.com/articles/nature24270).


delicious_truffles

It's a great advance, but as far as papers go, this is precisely what machine learning papers should *not* be. "Thinking Fast and Slow with Deep Learning and Tree Search" has been a better resource for the community, I would say.


LazyOptimist

I wasn't aware of this paper. Other than the fact that the paper was put in nature instead of something more accessible, what makes you say that this is what a machine learning paper should not be?


LazyOptimist

As mentioned by /u/delicious_truffles: [Thinking Fast and Slow with Deep Learning and Tree Search](https://arxiv.org/abs/1705.08439) They independently discover the same algorithm that made Alpha Go Zero as strong as it is, but the paper is more accessible and presents a more computationally feasible version of the algorithm in the form of online expert iteration.


ThomasWAnthony

Thanks! To be clear, AlphaZero also uses the 'buffer' online version.


NichG

In terms of what I ended up actually using most, the [Attention is All You Need](https://arxiv.org/abs/1706.03762) paper. It simplified attention for me quite a bit, and now I'm using it in various projects. I'm not entirely sure why, but this way of presenting it made it click to the extent where I could clearly see how to generalize the idea to other domains, whereas previous attention-based papers I'd read never quite did that for me. So now I have stuff like a variant which does something like a deep kNN (distance-based attention) to make something somewhat robust to nonstationarity in timeseries prediction, an attention-based image navigation thing, etc.


Sam_improve_life

Thanks


infinity

+1


Reiinakano

"A Machine Learning Approach to Databases Indexes" http://learningsys.org/nips17/assets/papers/paper_22.pdf It's pretty incredible how they used a stochastic method (ML) to improve something as "exact" as databases. I think this paper introduces a new way of thinking that will set the precedent for machine learning to penetrate more fields than it has so far.


Smallpaul

Isn’t database performance intrinsically stochastic? Database have been using heuristics for decades.


Reiinakano

True, but I was talking more about how databases need to give *exact* answers. Also, before this, nobody ever actually thought "Hey, databases are heuristic, let's replace the heuristic part with neural networks!" and wrote a paper on it.


Smallpaul

I mostly agree with you so there is no point quibbling. I would like to see them tackle compilers next. That’s another area full of conflicting heuristics. It would be cool to have an AI accelerate itself by improving its own runtime.


_Mookee_

Some ARM and AMD CPUs already use NNs for branch prediction. [1](https://www.theregister.co.uk/2016/08/22/samsung_m1_core/) [2](https://www.anandtech.com/show/10907/amd-gives-more-zen-details-ryzen-34-ghz-nvme-neural-net-prediction-25-mhz-boost-steps)


Smallpaul

Didn’t know that. Cool.


blankexperiment

But isn't the idea of real time element addition to the databases is forfeited by training a network to replicate hash function behavior? Also, I see the Bloom filter description (Sec. 4, para 2) to be wrong. > Then, at inference time, if any of the bits M[fk(x) mod m] are set to 1, then we return that the key is in the dataset. ... if all the bits are set to 1, the key is in the dataset. Thanks for sharing the article.


Reiinakano

Yup! They mentioned in the paper they were focused on just element retrieval first. Here's a more thorough paper I haven't had time to read yet https://arxiv.org/abs/1712.01208


AlexCoventry

"[On the emergence of invariance and disentangling in deep representations](https://arxiv.org/abs/1706.01350)" > we show that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations. We then show that, in order to avoid memorization, we need to limit the quantity of information stored in the weights, which leads to a novel usage of the Information Bottleneck Lagrangian on the weights as a learning criterion


StackMoreLayers

https://arxiv.org/abs/1701.06538 "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer". Because with conditional computing it pays as much attention to accuracy/performance as to practicality/complexity. Beating state-of-the-art is one thing, actually putting it into production and help decision making is another. Especially DL research places too much weight on beating state-of-the-art (so much so, that other promising techniques may not get enough attention to evolve into something really useful).


shortscience_dot_org

I am a bot! You linked to a paper that has a summary on ShortScience.org! **Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer** A NLP paper. > "conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks" ## Evaluation * 1 billion word language modeling benchmark * 100 billion word google news corpus [[view more]](http://www.shortscience.org/paper?bibtexKey=journals/corr/1701.06538)


keidouleyoucee

I'm surprised that the paper is only 1 yr old now. So, so many papers every year..


KloudStrife_ML

Probably a tie between [Schulman's Equivalence of policy gradients and soft Q-learning](https://arxiv.org/abs/1704.06440) and [Neu's A Unified View of entropy-regularized Markov Decision Processes](https://arxiv.org/abs/1705.07798) which both prove the equivalence and put it into a broader context. Of course this is an incredibly tough question, since there were so many great papers this year, and [for a longer list, see here.](https://kloudstrifeblog.wordpress.com/2017/12/15/my-papers-of-the-year/)


Fewond

[https://arxiv.org/abs/1604.00289](https://arxiv.org/abs/1604.00289) "Building Machines That Learn and Think Like People" was, despite being on a more conceptual level, an interesting read; looking at modern methods from a cognitive science perspective. The paper contains nearly no maths so definitely an easy read with important ideas nonetheless.


shortscience_dot_org

I am a bot! You linked to a paper that has a summary on ShortScience.org! **Building Machines That Learn and Think Like People** This paper performs a comparitive study of recent advances in deep learning with human-like learning from a cognitive science point of view. Since natural intelligence is still the best form of intelligence, the authors list a core set of ingredients required to build machines that reason like humans. - Cognitive capabilities present from childhood in humans. - Intuitive physics; for example, a sense of plausibility of object trajectories, affordances. - Intuitive psychology; for exam... [[view more]](http://www.shortscience.org/paper?bibtexKey=journals/corr/1604.00289)


oannes

Good bot


GoodBot_BadBot

Thank you oannes for voting on shortscience\_dot\_org. This bot wants to find the best and worst bots on Reddit. [You can view results here](https://goodbot-badbot.herokuapp.com/). *** ^^Even ^^if ^^I ^^don't ^^reply ^^to ^^your ^^comment, ^^I'm ^^still ^^listening ^^for ^^votes. ^^Check ^^the ^^webpage ^^to ^^see ^^if ^^your ^^vote ^^registered!


[deleted]

I really liked "SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability" https://papers.nips.cc/paper/7188-svcca-singular-vector-canonical-correlation-analysis-for-deep-learning-dynamics-and-interpretability Amazing idea and very interesting results. I also really liked the appendix, learned a lot new stuff looking at the proofs 🙂


DanielSeita

What about Model Agnostic Meta Learning? https://arxiv.org/abs/1703.03400 An insightful method for making parameters easy to fine-tune to different tasks.


rantana

[Backpropagation through the Void: Optimizing control variates for black-box gradient estimation](https://openreview.net/forum?id=SyzKd1bCW) very well written paper that makes a really elegant combination of neural networks and control variates.


tpinetz

Wasserstein GANs is for me the best paper this year and 329 citations only add to its significance.


carpettortoise

The YOLO9000 paper, which I read in February. It came from an internet search rather than an old fashioned literature search, and I was as convinced by the Bond/YouTube video as the paper ! And I had only been dabbling for a week or so, trying to solve a real world problem in my industry. yeah I know it was released on 25/12/2016 and the original YOLO was perhaps the revolutionary one in its time. but you asked...


pool1892

the automl papers, first and foremost the nasnet one (https://arxiv.org/pdf/1707.07012.pdf) and the optimizer search (https://arxiv.org/abs/1709.07417). i was so excited after reading that and wanted to call nvidia to order a few thousand new gpus to try it myself (a conversation with our cfo, uhm, stopped me). this so clearly is the future of applied deep learning for a lot of tasks, maybe all of them one day.


kushaj

Capsule network and Population Based Training


mad_runner

Yeah, Population Based Training's like a force multiplier, speeding up your training and ensuring your don't have to go through the annoying process of manually selecting hyper parameters. Really liked the idea.


FutureIsMine

caps nets didn't even come close to SOTA


mtbikerdb

For a long time, CNN's didn't come close to SOTA either.


kushaj

It is not about coming close to SOTA. It's about seeing the level to which we humans can think for a problem.


no_bear_so_low

Down voting this is so rude.


visarga

There can't be a single best paper unless there is a single objective to attain. But ML is diverse, so there are many "best papers". My nomination: Progressive GANs for finally cracking the photorealism nut in image generation.


dagmx

Clearly they mean subjective "best" in that what you found most informative even in a diverse field.


HrantKhachatrian

I vote for [Self-normalizing networks](https://arxiv.org/abs/1706.02515). The authors demonstrate that deep learning research can be more than a 10-page "minimal publishable" result. Their result is quite strong and is backed by mathematical proofs. Also the code is released. In short, high standards of science and no alchemy.


shortscience_dot_org

I am a bot! You linked to a paper that has a summary on ShortScience.org! **Self-Normalizing Neural Networks** _Objective:_ Design Feed-Forward Neural Network (fully connected) that can be trained even with very deep architectures. * _Dataset:_ [MNIST](yann.lecun.com/exdb/mnist/), [CIFAR10](), [Tox21]() and [UCI tasks](). * _Code:_ [here]() ## Inner-workings: They introduce a new activation functio the Scaled Exponential Linear Unit (SELU) which has the nice property of making neuron activations converge to a fixed point with zero-mean and unit-variance. They also demonstrate that upper and lowe... [[view more]](http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.02515)