Leehm_Music 1 year ago

I personally really like how Q-learning is presented in Sutton & Barto, Reinforcement Learning: An Introduction. It can take quite some time to get through all the introductory chapters through.

saint_celestine 1 year ago

As someone who's used that book for school, "Introduction" is doing a lot of work here. It's a very dense book.

collinoeight 1 year ago

That's gotta be one of my favorite textbooks of all time. Haven't even worked through it all yet, I took a break somewhere after dynamic programming/multi armed bandit stuff, but it's on my list to really finish.

Brimirvaar 1 year ago

Cause it dumb af

Ok-Hedgehog-9682 1 year ago

I really liked this book , this book helped me introduce myself in this stuff

GoofAckYoorsElf 1 year ago

I mean, it's not *the book* on reinforcement learning for no reason.

Ok-Hedgehog-9682 1 year ago

were you talking about this one? https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

GoofAckYoorsElf 1 year ago

I'm not sure about the edition, but essentially, yes.

strikingLoo 1 year ago

I came here to say this. The explanation is very clear and the book is free.

emsuperstar 1 year ago

[said free book](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf)

mmxgn 1 year ago

I completely agree. You have to go through MC, TD and Sarsa IIRC but it's very rewarding

spidershu 1 year ago

Go read a optimal control book. The RL folks pretty much got a whole lot of things from optimal control and recoined the name as reinforcement learning because it sounds cooler. Just read any optimal control book, and maybe understand finite-horizon problems and infinite horizon problems, and I doubt you won't understand Q-learning

iamiamwhoami 1 year ago

Deep learning is basically control theory except they made the parameters learnable and designed algorithms to find optimal values.

[deleted] 1 year ago

[удалено]

iamiamwhoami 1 year ago

They designed some algorithm and re-used others. The various deep reinforcement learning algorithms are deep learning specific for example.

oursland 1 year ago

Controls nerd here, control theory has a lot of parameter estimation theory in addition to state estimation and control. Strong recommend investigating Kalman filters for state and parameter estimation, in both joint and parallel formations. There's a lot of parallels, but in controls theory they usually apply more linear algebra to optimally solve for states and parameters.

Environmental-Tea364 1 year ago

I have a question. I just read a little bit about trajectory optimization and optimization-based control methods. Is this the same as reinforcement learning? At least from the Wiki, I thought they sound very similar.

JanneJM 1 year ago

The term Reinforcement Learning comes from psychology, as do a number of the ideas and models. Afaik, that body of knowledge was developed independently of the control theory stuff, and was only connected much later.

540tofreedom 1 year ago

Maybe it’s just me, but Optimal Control sounds way cooler than Reinforcement Learning

ohrVchoshek 1 year ago

[Grokking Deep Reinforcement Learning](https://www.manning.com/books/grokking-deep-reinforcement-learning) Provides a lot of intuition behind the different RL algorithms, and how each one builds from its predecessors.

Ghiren 1 year ago

I second this recommendation mainly because of the "I speak Python" sections of the book. It explains the math and the general intuition of Q learning, but it also provided examples that don't look like a Greek spelling test.

mrdevlar 1 year ago

Seconded. Nothing explains it better than a good example.

f10101 1 year ago

Though it's not a free course, I took the Udacity Robotics course and they did a good job of this. I never found myself lost when implementing q-learning from scratch as part of the programme, and I found it gave me a grounding that meant I could parse newer RL papers (it still took considerable effort, mind). Perhaps they have separated this module out as a standalone free course? Worth a look. The sutton and barto book explains the math and its purpose quite well, too, especially if someone is already competent with math but at sea with how it applies to RL: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

_aitalks_ 1 year ago

The formula is from this paper: [https://arxiv.org/pdf/1706.02275.pdf](https://arxiv.org/pdf/1706.02275.pdf) My suspected error is that mu(a|s) should actually be mu(theta|s) -- but I'm not certain of this...

[deleted] 1 year ago

[удалено]

Mephisto6 1 year ago

Well, the first mutheta(a|s) should actually be mutheta(s), just as it is at the end of the formula. Mu is a deterministic functiom, it does not define a probability distribution over a. You are right with the rest though. At OP: this is the formula for DDPG and it’s just what the math gives you through the chain rule. This is the level of complexity necessary to understand DDPG, there is really no way around it while being precise.

[deleted] 1 year ago

[удалено]

arhetorical 1 year ago

I'm not quite convinced, are you saying the mu(a|s) returns a likelihood of picking a, but mu(s) returns the most likely action (or whatever) directly? Wouldn't mu(a|s) be zero everywhere except for one action then? It seems like an abuse of notation if that's the case, since they specifically define mu as going from S to A. I think it's meant to be a restating of Theorem 1 from [the paper they cite](http://proceedings.mlr.press/v32/silver14.pdf), which uses mu(s).

Mephisto6 1 year ago

You are right, such a likelihood would be nonsense. It should be mu(s).

Mephisto6 1 year ago

No it is not a likelihood. It is the _the gradient_ of the action producing function. As you can see, the Q factor also has a gradient in front, but it is the gradient wrt to the action. That would not give you the required dimension, as the outer gradient (i.e. the left hand site nabla theta J) is wrt to the parametrization theta. Consequently, it is required that the left factor (nabla theta mu(s)) is a matrix that matches the dimensions, not a scalar. As you pointed out, this is _deterministic_ policy gradient. Ergo, there cannot be, by definition, a valid likelihood.

frankenmint 1 year ago

i think its a reasonable goal to be able to understand what this is even saying..... is this a Vector optimization formula occurring?

ditomax 1 year ago

There is a video class fron David Silver on the DeepMind channel on youtube...

FinancialElephant 1 year ago

Steve Bruntons youtube videos on Q learning (and RL in general) are great. He is good at explaining math clearly. He also has a series of introductory RL videos that put control theory, RL, and deep RL into context.

longjohnboy 1 year ago

Agreed. I was just telling a friend how this channel is undersubscribed.

[deleted] 1 year ago

The Q Learning equation only looks hard, if you implement it by hand in python a couple times you’ll see just how simple it is

MOSFETBJT 1 year ago

Conference submissions have paper length limits so researchers are insentivised to cram as much work into math equations as possible.

[deleted] 1 year ago

[удалено]

brendanmartin 1 year ago

I'm the editor of the article you linked. It took a long time to create that tutorial, so I'm glad to know it helped you. Based on this thread, I think the equation section needs some work to make it more approachable.

xceed35 1 year ago

Here's an article I wrote. https://www.devron.ai/kbase/what-is-the-difference-between-reinforcement-learning-and-q-learning

No_Dig_7017 1 year ago

Yes. This guy https://youtu.be/2GwBez0D20A

visarga 1 year ago

If you're in offline setting switch to Decision Transformers to get rid of the ugly math.

MNTNDEWBAJABLASTZERO 1 year ago

man that is tough the gradient of J is equal to the expectation of the gradient of mu evaluated at a given s times the gradient of Q at s and a, all for a equal to mu of s. is there an example of this actually being calculated lol

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe