# ppo算法
[零基础学习强化学习算法:ppo_哔哩哔哩_bilibili](https://www.bilibili.com/video/BV1iz421h7gb/?spm_id_from=333.337.search-card.all.click&vd_source=f553a12b04c16a678ddc0064cc04563c)
action space
策略policy
trajectory
return
马尔科夫链
蒙特卡洛





