(资料图)
CliffWalkingEnv.py
PolicyIteration.py
ValueIteration.py
TruncatedPolicyIteration.py
dynamic_programming.py
Results of policy iteratoin, value iteration and truncated policy iteration with iteration of 1, 10 and 100 are respectively shown below.
The above codes mainly refer to Chapter 4 of Hands-on Reinforcement Learning, but some changes have been made based on David Silver's lecture and Shiyu Zhao's Mathematical Foundation of Reinforcement Learning.
[1] https://hrl.boyuai.com/
[2] https://www.davidsilver.uk/teaching/
[3] https://github.com/MathFoundationRL/Book-Mathmatical-Foundation-of-Reinforcement-Learning
标签
Copyright ? 2015-2022 南极粮油网版权所有 备案号:粤ICP备2022077823号-13 联系邮箱: 317 493 128@qq.com