Files
cyy_othermind/零碎的知识/ppo算法.md
2026-03-08 04:05:39 +08:00

1.2 KiB
Raw Permalink Blame History

ppo算法

零基础学习强化学习算法ppo_哔哩哔哩_bilibili

image-20260203074417725 image-20260203074801825

action space

策略policy

trajectory

return

马尔科夫链

蒙特卡洛

image-20260203075054012

image-20260203075357801

image-20260203075438132

image-20260203075557016

image-20260203075828723

image-20260203075933230