# ppo算法 [零基础学习强化学习算法:ppo_哔哩哔哩_bilibili](https://www.bilibili.com/video/BV1iz421h7gb/?spm_id_from=333.337.search-card.all.click&vd_source=f553a12b04c16a678ddc0064cc04563c) image-20260203074417725 image-20260203074801825 action space 策略policy trajectory return 马尔科夫链 蒙特卡洛 ![image-20260203075054012](http://tuchuang-cyy.oss-cn-beijing.aliyuncs.com/img/image-20260203075054012.png) ![image-20260203075357801](http://tuchuang-cyy.oss-cn-beijing.aliyuncs.com/img/image-20260203075357801.png) ![image-20260203075438132](http://tuchuang-cyy.oss-cn-beijing.aliyuncs.com/img/image-20260203075438132.png) ![image-20260203075557016](http://tuchuang-cyy.oss-cn-beijing.aliyuncs.com/img/image-20260203075557016.png) ![image-20260203075828723](http://tuchuang-cyy.oss-cn-beijing.aliyuncs.com/img/image-20260203075828723.png) ![image-20260203075933230](http://tuchuang-cyy.oss-cn-beijing.aliyuncs.com/img/image-20260203075933230.png)