代码拉取完成,页面将自动刷新
the implement of REINFORCE algorithm and DDPG algorithm in pytorch
all code is in one file and easily to follow
only in CartPole-v0 environment, can not learn well in Pendulum-v0
only in Pendulum-v0 for ddpg only suit for continuous task
in pendulum-v0
TD-3
version with 2 critic networks and soft update, soft update
version is the one in ddpg original paper, hard update
version is the one with the same target network update with DQN which is every C time hard update.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。