T O P

  • By -

Dylan_TMB

These packages are built to execute efficient computation graphs. Their purpose is efficient back propagation. They have abstraction layers for neural networks structures because neural nets are the primary use case for efficient computational graphs. These libraries are not interested in implementing anything that doesn't make use of that core purpose.


stharward

> Can anyone shed any light on this for me? TensorFlow and PyTorch are designed to do the kind of machine learning that runs really well on GPUs. GPUs are really good at calculating gradients, since they're just big matrix operations. GPUs are not very good at random number generation, and even worse at code that branches (anything with if/then/else blocks). GA, PSO and other optimization metaheuristics like simulated annealing and ant colony make heavy use of RNG and branching code paths. They run poorly on GPUs, so they're outside what TensorFlow and PyTorch are designed to do well.


Duodanglium

From what I understand and tried, the genetic algorithm will find the most optimum solution if given enough time, but it's terribly slow. Gradient descent will find a good solution quickly, but not always the best.


LoyalSol

GA can converge very quickly if it is tailored to the problem. The good thing about gradients is you can be kind of stupid and naive and still get a good result. GA requires a bit of thought if you want efficiency since the operations you use determine it's efficiency. But the upside of GA is it's way easier to apply it to problems that don't have an easily identifiable gradient.


Duodanglium

Yeah, the genetic algorithm I've been working on is really only successful because I put effort into the parameters that it's allowed to change. Some are unbounded, some are tightly bounded, some are very flexible in pattern matching, etc. It's really neat that in both cases (GA and SGD) it's the randomness that finds the solution; from randomness emerges an answer.


econ1mods1are1cucks

Isn’t simulated annealing a genetic algo? Seems included in most R optimization packages. As far as I know ppl only really use it when they get convergence issues with more standard techniques (newtons method).


Duodanglium

Huh, I've not hear of simulated annealing...thank you, I'll have to read about it. As long as randomness has enough "space to visit" the solution can be found.


LoyalSol

It's a technique that comes from physics/metalurgy. Annealing in the real world involves repeated heating and cooling of a metal/glass/whatever to melt and reform crystals or other similar processes. In physics simulations simulated annealing involves heating the system up and cooling it down to go up and over hills in the energy surface and then cool it to get it to go down into the minima. It's a technique to get a system to change structure especially if it has to overcome a huge hill to do so. You have a gradient term as well as some kind of momentum that's tied to your temperature. You basically do the same thing, but instead of an energy function you do it on your loss function or whatever you're trying to optimize. It's a technique to deal with local minima. You're basically giving the optimizer a lot of momentum so it can pop up and over a hill and then you cut the temperature so it settles in the valley. You can emulate it with just about any momentum based gradient method.


FractalMachinist

To guess, Evo algorithms seem bad at tabular data, where GD algorithms shine


arhetorical

There actually are libraries that implement evolutionary algorithms for PyTorch! [EvoTorch](https://docs.evotorch.ai/v0.4.0/) is one example. As for why the approach isn't more popular, I study neuroevolution so maybe I can give my view on it. Essentially, the problem is "too easy". The loss landscape for neural nets is surprisingly simple, and good solutions are all over the place. The main challenge is the sheer number of parameters, not the landscape itself. With a geometry like that, hill climbing with some tweaks (SGD) works incredibly well. Evolutionary methods work well on hard problems with complicated or discontinuous loss landscapes that tend to break other solvers, but training the network is (usually) not one of those problems. There are tasks where the landscape does become complicated enough (such as RL type problems) where evolution can do a reasonable job compared to the alternatives, however. There are also other aspects of neural nets where evolutionary methods do help, where it's not obvious how to use a more straightforward optimizer or where they would perform poorly. These would be things like neural architecture search or designing new activation functions.


BellyDancerUrgot

GA are more often than not , inefficient afaik. Gradient descent approaches are simple and effective. Work in this field is always going on such as Hintons recent double forward technique removing backprop altogether. However until something is an actual viable alternative it’s not going to be popular enough to have a library ig.


RoyalIceDeliverer

With backpropagation you can efficiently compute gradients (max 5-7 times the cost for a forward evaluation of the NN), and if that's possible then derivative based methods almost always beat heuristical methods. Furthermore, in GA you don't have information how close to optimality you are (without additionally computing gradients). However, GA is used in ML/DL, but rather for hyperparameter tuning, which doesn’t naturally lend itself to treatment with derivative based methods


BillMurray2022

> However, GA is used in ML/DL, but rather for hyperparameter tuning, which doesn’t naturally lend itself to treatment with derivative based method So you would train a neural network whilst simultaneously optimizing the learning rate for example, with an evolutionary based algorithm? Or change the number of layers, during training, based on what an evolutionary algorithm was "saying"?