2 RLSAC 算法


Policy Gradient Methods for Reinforcement Learning with Function SMSM-NIPS99.pdf
此文是前面看的幾篇的基礎(chǔ)
** 2 Policy Gradient with Approximation**

Theorem 2 (Policy Gradient with Function Approximation).




3 Application to Deriving Algorithms and Advantages
7p
the advantage function
在綜述中描述不清,這里解釋比較通順。The choice of v does not affect any of our theorems, but can substantially affect the variance of the gradient estimators. baseline的問題

4 Convergence of Policy Iteration with Function Approximation