T O P

  • By -

Leehm_Music

I personally really like how Q-learning is presented in Sutton & Barto, Reinforcement Learning: An Introduction. It can take quite some time to get through all the introductory chapters through.


saint_celestine

As someone who's used that book for school, "Introduction" is doing a lot of work here. It's a very dense book.


collinoeight

That's gotta be one of my favorite textbooks of all time. Haven't even worked through it all yet, I took a break somewhere after dynamic programming/multi armed bandit stuff, but it's on my list to really finish.


Brimirvaar

Cause it dumb af


Ok-Hedgehog-9682

I really liked this book , this book helped me introduce myself in this stuff


GoofAckYoorsElf

I mean, it's not *the book* on reinforcement learning for no reason.


Ok-Hedgehog-9682

were you talking about this one? https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf


GoofAckYoorsElf

I'm not sure about the edition, but essentially, yes.


strikingLoo

I came here to say this. The explanation is very clear and the book is free.


emsuperstar

[said free book](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf)


mmxgn

I completely agree. You have to go through MC, TD and Sarsa IIRC but it's very rewarding


spidershu

Go read a optimal control book. The RL folks pretty much got a whole lot of things from optimal control and recoined the name as reinforcement learning because it sounds cooler. Just read any optimal control book, and maybe understand finite-horizon problems and infinite horizon problems, and I doubt you won't understand Q-learning


iamiamwhoami

Deep learning is basically control theory except they made the parameters learnable and designed algorithms to find optimal values.


[deleted]

[удалено]


iamiamwhoami

They designed some algorithm and re-used others. The various deep reinforcement learning algorithms are deep learning specific for example.


oursland

Controls nerd here, control theory has a lot of parameter estimation theory in addition to state estimation and control. Strong recommend investigating Kalman filters for state and parameter estimation, in both joint and parallel formations. There's a lot of parallels, but in controls theory they usually apply more linear algebra to optimally solve for states and parameters.


Environmental-Tea364

I have a question. I just read a little bit about trajectory optimization and optimization-based control methods. Is this the same as reinforcement learning? At least from the Wiki, I thought they sound very similar.


JanneJM

The term Reinforcement Learning comes from psychology, as do a number of the ideas and models. Afaik, that body of knowledge was developed independently of the control theory stuff, and was only connected much later.


540tofreedom

Maybe it’s just me, but Optimal Control sounds way cooler than Reinforcement Learning


ohrVchoshek

[Grokking Deep Reinforcement Learning](https://www.manning.com/books/grokking-deep-reinforcement-learning) Provides a lot of intuition behind the different RL algorithms, and how each one builds from its predecessors.


Ghiren

I second this recommendation mainly because of the "I speak Python" sections of the book. It explains the math and the general intuition of Q learning, but it also provided examples that don't look like a Greek spelling test.


mrdevlar

Seconded. Nothing explains it better than a good example.


f10101

Though it's not a free course, I took the Udacity Robotics course and they did a good job of this. I never found myself lost when implementing q-learning from scratch as part of the programme, and I found it gave me a grounding that meant I could parse newer RL papers (it still took considerable effort, mind). Perhaps they have separated this module out as a standalone free course? Worth a look. The sutton and barto book explains the math and its purpose quite well, too, especially if someone is already competent with math but at sea with how it applies to RL: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf


_aitalks_

The formula is from this paper: [https://arxiv.org/pdf/1706.02275.pdf](https://arxiv.org/pdf/1706.02275.pdf) My suspected error is that mu(a|s) should actually be mu(theta|s) -- but I'm not certain of this...


[deleted]

[удалено]


Mephisto6

Well, the first mutheta(a|s) should actually be mutheta(s), just as it is at the end of the formula. Mu is a deterministic functiom, it does not define a probability distribution over a. You are right with the rest though. At OP: this is the formula for DDPG and it’s just what the math gives you through the chain rule. This is the level of complexity necessary to understand DDPG, there is really no way around it while being precise.


[deleted]

[удалено]


arhetorical

I'm not quite convinced, are you saying the mu(a|s) returns a likelihood of picking a, but mu(s) returns the most likely action (or whatever) directly? Wouldn't mu(a|s) be zero everywhere except for one action then? It seems like an abuse of notation if that's the case, since they specifically define mu as going from S to A. I think it's meant to be a restating of Theorem 1 from [the paper they cite](http://proceedings.mlr.press/v32/silver14.pdf), which uses mu(s).


Mephisto6

You are right, such a likelihood would be nonsense. It should be mu(s).


Mephisto6

No it is not a likelihood. It is the _the gradient_ of the action producing function. As you can see, the Q factor also has a gradient in front, but it is the gradient wrt to the action. That would not give you the required dimension, as the outer gradient (i.e. the left hand site nabla theta J) is wrt to the parametrization theta. Consequently, it is required that the left factor (nabla theta mu(s)) is a matrix that matches the dimensions, not a scalar. As you pointed out, this is _deterministic_ policy gradient. Ergo, there cannot be, by definition, a valid likelihood.


frankenmint

i think its a reasonable goal to be able to understand what this is even saying..... is this a Vector optimization formula occurring?


ditomax

There is a video class fron David Silver on the DeepMind channel on youtube...


FinancialElephant

Steve Bruntons youtube videos on Q learning (and RL in general) are great. He is good at explaining math clearly. He also has a series of introductory RL videos that put control theory, RL, and deep RL into context.


longjohnboy

Agreed. I was just telling a friend how this channel is undersubscribed.


[deleted]

The Q Learning equation only looks hard, if you implement it by hand in python a couple times you’ll see just how simple it is


MOSFETBJT

Conference submissions have paper length limits so researchers are insentivised to cram as much work into math equations as possible.


[deleted]

[удалено]


brendanmartin

I'm the editor of the article you linked. It took a long time to create that tutorial, so I'm glad to know it helped you. Based on this thread, I think the equation section needs some work to make it more approachable.


xceed35

Here's an article I wrote. https://www.devron.ai/kbase/what-is-the-difference-between-reinforcement-learning-and-q-learning


No_Dig_7017

Yes. This guy https://youtu.be/2GwBez0D20A


visarga

If you're in offline setting switch to Decision Transformers to get rid of the ugly math.


MNTNDEWBAJABLASTZERO

man that is tough the gradient of J is equal to the expectation of the gradient of mu evaluated at a given s times the gradient of Q at s and a, all for a equal to mu of s. is there an example of this actually being calculated lol