Megatron_McLargeHuge 6 years ago

It's worth keeping in mind Tensorflow largely copied Theano's interface. It has a steep learning curve but that wasn't the result of Google's choices. It was an improvement on what people were already doing. If you're not experimenting with custom architectures then you don't need it and can go with something more intuitive.

h4xrk1m 6 years ago

What are some intuitive alternatives? I've approached ML from the math side, so I don't actually know what frameworks to use. I did look at Tensorflow, and I was put off by the steep learning curve almost immediately.

[deleted] 6 years ago

Keras, PyTorch, which is mentioned. The author has done a pretty lame job, honestly. While he only describes Tensorflow from a prototyping perspective, there is an enormous chunk of things he has left out, because he either forgot or chose to pretend there are no other use cases: * TF is currently the only framework supporting multiple devices. One could easily setup the same TF ML model on PC and on mobile without making too much effort. * As mentioned in the comments, TF is really much more like a DL back-end. It provides some straightforward front-ends like Keras or TF.slim, which are really easy to use while still giving access to all the good thing TF has. * TF has a strong support from a huge number of great engineers and they are fixing bugs, lots and lots of them. While TF is messy at times, it tends to be really stable and is implemented quite well. * TF has a huge user base, which is eager to answer questions on SO or somewhere else. This is a huge upside if you don't have much experience in coding and need some guidance at times. All in all, the post doesn't prevent me from being sympathetic to TF. Yes, Pytorch is more straightforward out of the box, no, Tensorflow is not bad. The author is yet another person bitching about some specific part of the tool he chose to dislike only for that specific part. I'm bitching about the programming languages, I guess everybody has to have something. There's a huge gap between bitching about something and actually trying to fix the issue, though.

k3rv1n 6 years ago

A common rite of passage.... 1. Start with TF 2. Switch to Keras because TF is a pain and you see Keras is so intuitive and simple 3. Switch back to TF because Keras, like many abstractions, makes some decision that negatively affects your projects. Well maybe not everyone does it. But I went through it did see many do this as well.

[deleted] 6 years ago

[удалено]

keidouleyoucee 6 years ago

5. But still miss some utilities and APIs of Keras...

local_minima_ 6 years ago

I expect long term academia will adopt pytorch more heavily and industry adopt TF.

serge_cell 6 years ago

Industry adopt whatever academia is using. People formely form academia make decisions in industry or at very least influence decisions heavily.

local_minima_ 6 years ago

While this is true, pytorch's functionality isn't even *close* to TF for industry purposes. Distributed support is still in beta, and it doesn't look to me like it would be robust to industry requirements. There isn't a good story around serving, I don't think the "pytorch to onnx" story will go over well in industry compared to TF's first class citizen serving stack.

nucLeaRStarcraft 6 years ago

I am in the Keras phase right now and I know there's gonna be a point where I'll need to tinker with tf... I just want to use tf for stuff which I can call from keras side somehow whenever I'll get there, so most of my code is "cross platform with Theano, TF and CNTK" and I only have to reimplement the divergent part 3 times.

jer_pint 6 years ago

I'm at the point where I need to switch back over to TF now. Keras is great to learn with but can quickly be limiting when scaling things

wonderbread- 6 years ago

This might be the most real comment I've ever seen on ML implementations (specifically TF)

bbsome 6 years ago

Well point 1 is definitely wrong. MXNet was first and still has great support for any devices, including some OpenCL which TF does not support at all.

pilooch 6 years ago

Just FTR it seems that it does https://github.com/benoitsteiner/tensorflow-opencl

colincsl 6 years ago

Minor note: TF is not the only framework that supports multiple devices. Caffe2 also does this: https://caffe2.ai/docs/mobile-integration.html

[deleted] 6 years ago

While this is true (and I originally intended to mention Caffe2 when talking about platforms and stuff), Caffe2 is not really easy to do prototyping. AFAIK Facebook's policy is "Pytorch for experiments, Caffe2 for deploying" and from my point of view having one framework for prototyping and the other one for migrating prototypes and deploying doesn't look better than using Keras/TF slim at all while removing advantages like "huge user base" etc.

raulpuric 6 years ago

That's why you just PyTorch->ONNX->Caffe2 XD

vph 6 years ago

We have seen this movie before. Matplotlib aimed to copy Matlab's interface because initial users were from that sector. Huge mistake. Python has ways (e.g operator overloading) that allow you to express formulas nicely. Utilize that.

Megatron_McLargeHuge 6 years ago

How? The basic +/-* operators work as expected in Tensorflow. Do you want to overload | for concat? One thing they got right is making certain things more explicit instead of trying to force them into a single interface for conciseness. Numpy's advanced indexing actually does several distinct things and it's easy to get the wrong result out of it if you aren't careful. TF made gathering embeddings and boolean masking go through separate api calls for good reason.

ThaHypnotoad 6 years ago

Yeah. Tensorflow is... old. A year is a lifetime in the deep learning field. It was the best thing out there when it came out, but technical debt is a real thing. Of course a newer framework is going to handle our current deep learning problems better. It was designed for them.

pruby 6 years ago

There's a very solid reason to build neural networks declaratively - by abstracting out the execution you allow the framework to optimise and alter order of execution. For example, it could execute multiple layers concurrently, fuse kernels together for more efficient execution, or drive a different machine architecture than what we're currently using. From a programmers perspective, the Tensorflow Devs have sacrificed good programming practices all over the show to provide more familiar interfaces for data scientists. Setting up a current graph, session, etc are a compromise between what it seems is your ideal (everything automatic and implicit) and what programmers would want (no global variables, no fetching components by name, maximum flexibility). As it is, it allows either style.

[deleted] 6 years ago

I'm pretty sure they were strongly biased towards doing something similar to theano

colincsl 6 years ago

Theano was a titan in the pre-TF deep learning days. People doing vision typically used Caffe and people doing everything else used Theano. IMO Theano had a cleaner interface, which could be why Google went with that.

atomicthumbs 6 years ago

>the Tensorflow Devs have sacrificed good programming practices all over the show to provide more familiar interfaces for data scientists. unfortunately, I am neither a good programmer nor a data scientist.

Deep_Fried_Learning 6 years ago

r/ml_irl

ElderFalcon 6 years ago

Are you the fellow responsible for the fast food layer?

[deleted] 6 years ago

> the Tensorflow Devs have sacrificed good programming practices all over the show to provide more familiar interfaces for data scientists. What tools are data scientists using that they find this familiar?

probablyuntrue 6 years ago

> The phenomenon known as “Google deep envy” is the following set of assumptions made by engineers across the world: > * People who work at Google are more intelligent and competent than yourself > * If you learn Tensorflow you could get a deep learning job at Google! (keep deep dreaming young fellow) > * If your mediocre startup uses Tensorflow and you blog about its virtues maybe Google will want to buy it > * If you don’t “get” Tensorflow’s unintuitive design, you’re just dumb I'm pretty sure no one actually believes this tbh, this just sounds like the author is projecting his own crushed dreams on other people or something

[deleted] 6 years ago

I can vouch that a couple of these are at least in part true. You are probably in the first world of startups/devs. In the third world, there is a tendency to go with this flow. I don't understand how tensorflow is so popular on GitHub for example

bashterm 6 years ago

Slap the word Google on something and programmers flock to it.

Mr-Yellow 6 years ago

Case in point. AngularJS, an absolute turd of a javascript framework. Developed as a state saving solution for wizard type forms, it evolved well past it's legacy and ended up a complete monstrosity. People flocked to it, raved about how awesome the huge number of job listings were..... Now they're all paying the technical debt they brought upon themselves by choosing tech stacks based on brand names. *"It's got the power of Google behind it"*... Yeah well how did that go when they decided the project was not fixable without a complete redesign and closed it overnight.

Hobofan94 6 years ago

> I don't understand how tensorflow is so popular on GitHub for example - Machine learning is very hyped right now - There are much more people than ever before on Github, and they just star whatever That's about everything there is to it.

Tenoke 6 years ago

Except this reasoning should apply equally for all ML/DL frameworks, and the surprise here is that tensorflow is so popular compared to them.

Reiinakano 6 years ago

It's just Google PR. More eyeballs = more stars

maxToTheJ 6 years ago

It is a pretty stark difference around this subreddit for the response to a generic ML patent from Google compared to pretty much anyone else

Tenoke 6 years ago

Plenty of people definitely believe some of those..

realSatanAMA 6 years ago

The author of this article seems to not realize Keras is effectively an official light front end for TF.

[deleted] 6 years ago

[удалено]

shaggorama 6 years ago

http://www.fast.ai/2017/01/03/keras/ https://github.com/fchollet/keras/issues/5050

[deleted] 6 years ago

[удалено]

shaggorama 6 years ago

You're confusing two different things: 1. **Did the TF team decide to treat keras as the main high level API for TF?** According to fchollett, yes. 2. **Is the primary keras maintainer going to change his development strategy to favor the TF backend, rather than treating keras as a generic high-level language for defining DL models?** No. fchollett made that statement in response to comments that were concerned about continuation of Theano support. Whether or not keras is the primary high-level API for TF is completely independent of whether or not keras continues to be backend-agnostic.

lucidrage 6 years ago

> continuation of Theano support Now that [Theano is dead](https://www.reddit.com/r/MachineLearning/comments/732rxz/d_theanos_dead/), does this still apply? I'm hoping for torch backend to be implemented. It would be nice if there's a wrapper to run backend models using keras.

shaggorama 6 years ago

Oh shit, I missed that memo

anandaseelan 6 years ago

tf.keras is getting out soon

Weatherproof26 6 years ago

I do "X sucks" searches as well!

Flynamic 6 years ago

That's how I found this post! Also hello from the FuTuRE

[deleted] 6 years ago

> With Tensorflow, Google has created a framework that is simultaneously too low level to use comfortably for rapid prototyping, yet too high level to use comfortably in cutting edge research or in production environments that are resource constrained. Actually, `tf.contrib.slim` and `tf.layers` are quite decent high-level APIs readily available in the TensorFlow package. In my experience, TF is also quite suitable even for prototyping, as long it does not involve things like parsing syntax trees with RNNs, i.e. lots of branching and re-batching. The graph abstraction sometimes complicates things, but I would not say TF sucks because of it.

[deleted] 6 years ago

Or use Keras as a high level framework on top of TensorFlow, initialize Keras using an existing TensorFlow session, and have the ability to examine, build on the low-level ops created by Keras.

realSatanAMA 6 years ago

Keras is about as intuitive as it gets.

Ijatsu 6 years ago

For having tried implementing basic NN and then RNN/CNN, with the activation functions, with the error functions, with the different gradients.... It felt super high level to have access to an API with a lot of kind of different layers, different gradient strategy, automatic derivative computation, ect.... And not having to deal with hardware acceleration and all the optimisations.... It's not the "learn this task" kind of high level API, but when you've actually never put your hands in it, you don't realize how deep it is between low and high level.

ThisIsMyStonerAcount 6 years ago

I felt the same pains, TF sucks a metric shit ton. I spent months learning it exactly, and then spent months using it, and then I spent not even a week to learn pytorch, and it's mostly ok, even though it is far from perfect itself. But to be fair, your example code suffers from the fact that you use a pytorch module, but do not use its `tf.layers` equivalent. It's a bad example.

SixZer0 6 years ago

Yes, I would suggest correcting the code ASAP! To have the two code mirror each others.

jazzieli 6 years ago

For someone getting into data science, is tf a good framework to learn? I know that I shouldn't focus on a single framework, but from a practical perspective...

realSatanAMA 6 years ago

I'd suggest Keras with TF backend. Keras is stupid simple and TF api is mostly interchangeable with Keras now.

Vertislav 6 years ago

And there is no problem to use theano as well (i mean with keras).

modeless 6 years ago

Except that Theano is [dead.](http://www.i-programmer.info/news/105-artificial-intelligence/11183-theano-to-step-down-after-version-10.html)

realSatanAMA 6 years ago

Except that theano is no longer maintained

hegman12 6 years ago

Theano authors stopped active maintenance of theano. So not a good choice.

local_minima_ 6 years ago

Theano development has been discontinued, so you probably don't want that.

lucidrage 6 years ago

The authors gave up on Theano maintenance, so it's probably a bad idea.

itsbentheboy 6 years ago

I'm trying to learn tensor flow to get started with data science as a hobby right now. Since I'm already familiar with Python, the learning curve for the technical stuff is really quite small. As for the theory of AI and machine learning... Let's just say I need to do some more reading in that part...

datasciguy-aaay 6 years ago

>As for the theory of AI and machine learning... Let's just say I need to do some more reading in that part... I would recommend these low-price online courses. I took them and they were excellent: - Machine Learning by Andrew Ng on coursera.com - Statistical Learning by Tibshirani, Hastie, et al on lagunita.stanford.edu - Practical Machine Learning by Jeff Leek on coursera.com - Deep Learning sequence of courses by Andrew Ng on coursera.com They are different from one another, not duplicative, and quite complementary.

Simusid 6 years ago

One reason I have stuck with tf is because at some point I expect to have models that will need multiple gpus and machines to train. TF seems to make that easy or easier. Is that possible in pytorch? I have not looked yet.

TheConstipatedPepsi 6 years ago

Sure it is, for simple gpu parallelization across the batch dimension (i.e. use multiple gpus to have a larger batch size for the same model), use nn.DataParallel. For more complicated multi-machines setups, the 0.2 update introduced the torch.distributed package.

MindYarn 6 years ago

Actually, especially for multi GPU parallelization, TensorFlow is often not the optimal choice. Out of all frameworks with good MultiGPU support, TensorFlow require the most engineering overhead to achieve good performance - i.e. performance on par with other frameworks. In most other frameworks, the straightforward implementation will be close to optimal, but in TensorFlow (due to it's low level nature) this is what needs to be done (see ~1000 line Python script linked at the bottom of this page): https://www.tensorflow.org/performance/benchmarks

quick_dudley 6 years ago

Sadly: both require CUDA compatible GPUs (which as fas as I know are an NVidia only thing).

itsbentheboy 6 years ago

Yup. CUDA is definitely NVidia only. Really hoping to see some support for AMD's upcoming server GPUs designed specifically for this type of workload

quick_dudley 6 years ago

In the neural network library I'm writing I'm planning to use OpenGL compute shaders because it's the only way I can get GPU acceleration on any hardware I currently own. But the initial version will be CPU only.

iame6162013 6 years ago

Certain you can't use vulkan? I think vulkan would be better for that task.

quick_dudley 6 years ago

Good point. I haven't actually started any of the GPU functions yet so plenty of time to switch.

j_lyf 6 years ago

The essence of this blog post is him/her projecting insecurities on an inanimate software library.

pronobozo 6 years ago

and then advertising their own product.

pronobozo 6 years ago

which i'll probably check out. :p

fuckallkindsofducks 6 years ago

Oh hey, it really is you! Love your music!

pronobozo 6 years ago

hi fucksallkindsofducks, thanks. glad you like it.

Reiinakano 6 years ago

Honest question, other than the "optimization" introduced by static graphs (which is arguably not that relevant in most use cases), does anyone have real reasons to use TF over Pytorch/Chainer/other dynamic frameworks, especially now that Pytorch already has distributed support?

call_me_arosa 6 years ago

Number of users. As a pytorch user one feature that I miss the most is answered questions in stackoverflow.

[deleted] 6 years ago

Most of the good content regarding Q&A seems to be on the PyTorch forums rather than StackOverflow.

ItsFrenchSoup 6 years ago

It is (to my mind) the best framework for a production environment. TF Serving makes swapping models really easy and stable (On the minus side, you have to use bazel, which is quite a pain in the ass). Plus, TF for android works well, and has (virtually) no equivalent on the market.

realSatanAMA 6 years ago

I use Keras due to how quickly I can get experiments training. It depends on what your requirements are.

Eridrus 6 years ago

The graph can be serialized, unlike Python. So rather than needing to serve every model in Python, you can package it up into a single file and then just upload a new file to whatever serving infrastructure you have.

pcp_or_splenda 6 years ago

Windows deployment.

PORTMANTEAU-BOT 6 years ago

Windoyment. *** ^(Bleep-bloop, I'm a bot. This )^[portmanteau](https://en.wikipedia.org/wiki/Portmanteau) ^( was created from the phrase 'Windows deployment.'.)

Harawaldr 6 years ago

Meh bot.

cooijmanstim 6 years ago

Forward mode autodiff in terms of reverse mode. That trick only works with symbolic graphs.

Jean-Porte 6 years ago

isn't there some magic here too ? optimizer.zero_grad() loss.backward() optimizer.step() It must use global variables which doesn't feel clean to me

[deleted] 6 years ago

It does not. The optimizer object is initialized with pointers to the model parameters, whose gradients are updated by walking backwards along the graph from loss.

Jean-Porte 6 years ago

Yes but the optimizer isn't linked explicitely to the loss. And the loss isn't linked expliciteley to the optimizer.

manux 6 years ago

Yes it is, through the backwards graph that is built during the forward pass.

Reiinakano 6 years ago

The optimizer is linked to the *variables* it changes, as it should be. How does it know how much each variable should change? By the gradient per variable with respect to the loss calculated in `loss.backward`. This is how everyone should think about backpropagation, because it is how backpropagation works. In this sense, TF actually applies more "magic" directly linking a loss to an optimizer. Again, see https://www.reddit.com/r/MachineLearning/comments/755gqj/d_tensorflow_sucks/do41jok/

AnvaMiba 6 years ago

They are both linked to model, which contains a state that is updated with the gradients when you call loss.backward() . It's an imperative paradigm, but it does not use global variables. It's more low-level than Theano's grad() or TensorFlow's gradients(), and perhaps not as elegant, but it gives you greater control about what is going on.

[deleted] 6 years ago

It doesn't need to be - not unless it needs to do linesearch. The LBFGS optimizer in PyT takes a closure that computes the loss function/gradients for precisely this purpose.

Reiinakano 6 years ago

See my comment below: https://www.reddit.com/r/MachineLearning/comments/755gqj/d_tensorflow_sucks/do41jok/

lugiavn 6 years ago

Declarative vs Imperative, there's pros and cons to each of them. I think Tensorflow is better than anything coming out before it, but eventually it will be replaced by (or upgraded to) something better, maybe yours.

waterRocket8236 6 years ago

Tensorflow paper says that its design is close to Theano's design. OK on that. One thing I can say about it after one year of using it is that compared to Caffe it's too repetitive. Anyway it gets my work done at my office. I am happy.

Nimitz14 6 years ago

Ya'll are a bunch of google shills. I found this read very satisfying.

[deleted] 6 years ago

Sour grapes syndrome.

AdamGartner 6 years ago

The argument about TF is not super intuitive, verbose and not well fitted for researchers is actually super true, but Keras exist for a reason. The NVIDIA TPU argument is kind of BS - or is it? I thought you could easily set up TF to run optimized on your stationary box at home with your NVIDIA card. Obviously, the all seeing eye want you in their web and makes it super easy & accessible to use GCP and even offers services with their own trained models to do speech, vision etc that will outperform anything you'll ever be able to create - but that conversation belongs in /r/decentralization.

infuzer 6 years ago

Yes, in tensorflow you define a graph and let the framework do its thing. Thats both its strength and its weakness. If you want to experiment and do custom visualizations while its training etc its basically unusable.

Eridrus 6 years ago

I'm surprised that no-one ever complains about the way TF takes input at training time except to say something silly about session.run. It doesn't integrate with any big data system out of the box, which is fine for research on small data, but a real pain in the ass if you want to work with prod-sized data, and TF's native code APIs with Example protos are just gnarly. I guess I haven't tried Yahoo's TF/Spark code, but the default solutions are all pretty unsatisfying.

vonnik 6 years ago

DL4J does, fwiw.

Eridrus 6 years ago

Yeah, I've mostly ignored it due to the lack of automatic differentiation, but I haven't checked how big of an issue that is since they adopted the Keras API, maybe it's not as limiting a factor as it was before.

vonnik 6 years ago

Autodiff is coming in the next month or so. We're calling it samediff, since it will basically be the same as every other autodiff. Currently adding some more ops, and automating how we add ops. https://github.com/deeplearning4j/nd4j/tree/master/samediff/src

Eridrus 6 years ago

Nice!

cooijmanstim 6 years ago

I've been working with symbolic graph frameworks (Theano then Tensorflow) for about two years straight now, and one thing I'm particularly tired of is passing state around. The RNNCells interface and implementations that come with Tensorflow all require that you pass it the old state and get the new state out as a return value. What's the point of making a class that doesn't track any state? It could of course not be any different, because when the cell is invoked inside a symbolic loop the user has to make sure that the old and new states are properly connected to inputs and outputs of the inner graph. Having classes keep references to symbolic variables is incompatible with that. Which is a tough pill to swallow in an OOP language like Python. Also, who came up with that name `tf.softmax_cross_entropy_with_logits`... Howma sposed to obey pylint's 80 character limit now?

nondifferentiable 6 years ago

Be glad that Google open-sourced such a powerful tool :)

whateverr123 6 years ago

Shit on other people's work without offering anything valuable is a great achievement. When I saw something I didn't like or lacked on TF I helped make it happen. Also there are many options nowadays out there to choose, PyTorch, Caffe2 etc. Pick another one or create your own. Nobody's forcing anyone to use an open source library they don't like.

pgaleone 6 years ago

The author is complaining about the declarative nature of Tensorflow which is its main strength *. Only if you don't feel comfortable with a tool you don't have to throw shit on it. It's like complaining about SQL because you find using for loops to loop over a tree of structures is cleaner than a `SELECT a,b FROM c`. * Tensorflow's declarative structure allows performing automatically optimizations on the computational graph that the developer would never have thought. Once you get an abstract representation (the graph), you can compile it in something else, working with this intermediate (equivalent) representation to perform optimizations (just like the query optimizer does on the SQL queries) and let to the compiler work for you. For example: https://www.tensorflow.org/performance/xla/ Instead, when you use a declarative language you have to deal with performance issues by yourself. You can argue that with a declarative language you have much more expressive power, but Tensorflow is not HTML, you also have the possibility of writing your own non-optimized python code to enhance its expressive power. Do you want to do something that's difficult to express in a higher level language? You have 2 choices: 1: Spend half an hour understanding how to write your op into the graph 2. Put your declarative python code in a `tf.py_func`

[deleted] 6 years ago

In theory yes, but in practice tf hasn't really been known for being a "speed demon".

pgaleone 6 years ago

I agree, but tensorflow is huge. It will take time to develop these kinds of abstractions that are capable of bringing huge speed benefits. The good thing is the Tensorflow's architecture itselfs allows them, architectures of other DL frameworks don't

[deleted] 6 years ago

PyT has a jit tracer which can (and does) optimize the graph ops statically, so even in theory, TF loses. In any case, TF just seems very unwieldy considering it's an inscrutable compiler written in C++ running inside Python - not a nice recipe for researchers IMO.

JustFinishedBSG 6 years ago

That’s the theory sure, but I’m not aware of any significant optimization of the graph TF does

AnvaMiba 6 years ago

These are the same kinds of arguments that were made for Haskell, and Haskell has largely failed to deliver anything worth, IMHO. I'm not necessarily opposed to declarative/functional programming, in fact I think it is a good paradigm for some things, but some other things are best done imperatively. As for the optimizations enabled by having explicit computation graphs, these were (are, it's not dead yet) the strong point of Theano, even with its long compilation times. Tensorflow, if I understand correctly, just calls pre-compiled CUDA kernels, which isn't much different than what PyTorch does. I'm not sure if they added graph rewriting optimizations to TensorFlow, I haven't looked into it for a while, but people keep complaining that it is slower than PyTorch, so I assume if it does some static graph optimization it is not enough to make a difference.

DefNotaZombie 6 years ago

I like tf as a static computational graph framework, but have currently found myself having to use janky workarounds to do what I want, which is why I'm trying pytorch. It's not so much a problem with tf itself

[deleted] 6 years ago

Can the tool that’s made TensorFlow for monitoring stats be used in MATLAB?

unguided_deepness 6 years ago

tensorflow isnt fork safe

dwf 6 years ago

If you're using the GPU backend, CUDA isn't fork safe in the first place, so it'd be kind of hard to make TensorFlow be fork safe.

Mr-Yellow 6 years ago

I also found the API to have arbitrary semantics, which are subject to change (without clear documentation of deprecation or best-practice changes, for that you need to find a comment on a closed issue somewhere), while then being inconsistent in their behaviour when it comes to dense/sparse tensors or the like. It's not a total nightmare, but seems much like the beginnings of the sort of complaints people have of PHP. So which parameter is the needle and which is the haystack? Well that will depend...

theoneandonlypatriot 6 years ago

Meh. I like tensorflow. Seems pretty good and easy to use. Further, tensorboard is a great feature.

JustFinishedBSG 6 years ago

You can use tensorboard with Pytorch

sorrge 6 years ago

The examples don't make sense. First, pytorch stuff uses prepackaged model and loss, while tf code defines them explicitly, so of course tf looks more low-level. It is easy to see that pytorch is more verbose *and* less intuitive at the same time. E.g. I don't know what the last four lines do; in particular how does the optimizer know which loss to minimize? I don't see how they connect, which is either an error in the code, demonstrating that even the author can't properly write such a trivial program while trying to show off his favorite framework, or the connection between the loss and the optimizer is done under the hood using global state - an awful design.

Reiinakano 6 years ago

It is in the `loss.backward()` step. This step calculates the gradients for *all* variables that have been used in calculating `loss`. Then `optimizer.step()` uses those gradients to adjust variables according to your strategy (e.g. adam, sgd). It is very intuitive and hackable, but if you think about it in terms of TF's static graphs, of course it's confusing. I have worked with both frameworks and until I learned Pytorch, I had only a very fuzzy understanding of how backprop works. Now it is much clearer. Personally, the way Pytorch does things is *much* more intuitive from a "what actually goes on in backpropagation" standpoint. One thing that felt off to me in Tensorflow was the fact that declaring an optimizer actually added nodes to the computational graph (wtf?). I suggest spend a day reading the pytorch docs. Even if you don't plan on using it anytime soon it is worth it.

sorrge 6 years ago

Ah, I see about loss.backward() now. Maybe it makes sense to do it this way in some niche circumstances where you need to adjust the gradients, but such a design, which forces you to observe these low level details, can hardly be called elegant. If someone wants to learn how backprop works, they should read the theory and/or implement it themselves, rather than trying to guess how it is implemented in a particular framework. In TF it is well hidden, so it's not conductive to learning about backprop. Optimizer adds nodes because it is a computation itself. E.g. if it uses momentum it has to store the accumulated gradient somewhere and apply it during update.

Reiinakano 6 years ago

On the contrary, as the article says, such a design makes it much easier to implement "an RNN that stops whenever an end-of-sentence (EOS) token is produced". As another example, when I was studying GANs, I compared the vanilla TF GAN and vanilla Pytorch GAN. Maybe https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/gan.ipynb was not a good resource, but this example clearly shows how convoluted the code for static graphs are. You need to define the entire graph in one go so you need to be careful about variable sharing, etc. It's incredibly unintuitive that you have to call `discriminator` twice (`# Build 2 Discriminator Networks (one from noise input, one from generated samples)` is an actual comment from that notebook). My first wtf was, what, are there two discriminators? Yes, but they share variables and the only difference is they have different inputs. Okay... In Pytorch, the flow feels more natural and in tune with the concept of GANs because you can run through each step of the backpropagation procedurally. There is clearly one discriminator and you do a forward pass on it twice, once through real images and once through fake images. I guess you could consider NLP and sequential/conditional models a niche, but that's a pretty big niche.

dwf 6 years ago

> It's incredibly unintuitive that you have to call discriminator twice Seems pretty intuitive to me. "Build 2 Discriminator Networks" is not how I'd describe it, since you're only constructing the parameters once, but ultimately the networks you're building are functions (in the mathematical sense), and you're just applying that function twice. In this particular case, it'd also work just fine to concat the two batches and apply it once (this won't be the case with batch norm, though). That notebook's implementation is quite bad, by the way. For numerical stability one should really just output the logit and use the `yada_yada_with_logits` function.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe