机器学习（十） — 强化学习-Toy模板网

这篇具有很好参考价值的文章主要介绍了机器学习（十） — 强化学习。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

Reinforcement learning

1 key concepts

states

actions

rewards

discount factor $\gamma$

return

policy $\pi$

2 return

definition: the sum of the rewards that the system gets, weighted by the discount factor

compute:

$R_i$ : reward of state i

$\gamma$ : discount factor(usually close to 1), making the reinforcement learning impatient

$R_1 + \gamma R_2 + \cdots + \gamma^{n-1} R_n$

3 policy

policy $\pi$ maps state $s$ to some action $a$

$\pi(s) = a$

the goal of reinforcement learning is to find a policy $\pi$ to map every state $s$ to action $a$ to maximize the return

机器学习（十） — 强化学习,机器学习,机器学习,人工智能

4 state action value function

1. definition

$Q(s, a) = $return if

start in state $s$

take action $a$ once

behave optimally after that

2. usage

the best possible return from state $s$ is $ma x$ $Q (s, a)$

the best possible action in state $s$ is the action $a$ that gives $ma x$ $Q (s, a)$

5 bellman equation

$s$ : current state

$a$ : current action

$s^{'}$ : state you get to after taking action $a$

$a^{'}$ : action that you take in state $s^{'}$

$\gamma max Q(s^{'}, a^{'})$

6 Deep Q-Network

1. definition

use neural network to learn $Q (s, a)$

$a)\\ y = R(s) + \gamma max Q(s^{'}, a^{'}) \\ f_{w, b}(x) \approx y$

机器学习（十） — 强化学习,机器学习,机器学习,人工智能

2. step

initialize neural network randomly as guess of $Q (s, a)$

repeat:

take actions, get $s, a, R(s), s^{'})$

store N most recent $s, a, R(s), s^{'})$ tuples

train neural network:

create training set of N examples using $x = (s, a)$ and $\gamma max Q(s^{'}, a^{'})$

train $Q_{new}$ such that $Q_{new} \approx y$

set $Q = Q_{new}$