41 lines
1.2 KiB
Markdown
41 lines
1.2 KiB
Markdown
# ppo算法
|
||
|
||
[零基础学习强化学习算法:ppo_哔哩哔哩_bilibili](https://www.bilibili.com/video/BV1iz421h7gb/?spm_id_from=333.337.search-card.all.click&vd_source=f553a12b04c16a678ddc0064cc04563c)
|
||
|
||
<img src="http://tuchuang-cyy.oss-cn-beijing.aliyuncs.com/img/image-20260203074417725.png" alt="image-20260203074417725" style="zoom: 67%;" />
|
||
|
||
|
||
|
||
<img src="http://tuchuang-cyy.oss-cn-beijing.aliyuncs.com/img/image-20260203074801825.png" alt="image-20260203074801825" style="zoom:80%;" />
|
||
|
||
|
||
|
||
action space
|
||
|
||
|
||
|
||
策略policy
|
||
|
||
trajectory
|
||
|
||
return
|
||
|
||
|
||
|
||
马尔科夫链
|
||
|
||
蒙特卡洛
|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||

|
||
|