同步操作将从 曾伊言/ElegantRL 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.
Lightweight: The core codes <1,000 lines (check elegantrl/tutorial), using PyTorch (train), OpenAI Gym (env), NumPy, Matplotlib (plot).
Efficient: performance is comparable with Ray RLlib.
Stable: as stable as Stable Baseline 3.
Currently, model-free deep reinforcement learning (DRL) algorithms:
For DRL algorithms, please check out the educational webpage OpenAI Spinning Up.
Check out the ElegantRL documentation.
An agent in agent.py uses networks in net.py and is trained in run.py by interacting with an environment in env.py.
-----kernel file----
-----utils file----
As a high-level overview, the relations among the files are as follows. Initialize an environment in Env.py and an agent in Agent.py. The agent is constructed with Actor and Critic networks in Net.py. In each training step in Run.py, the agent interacts with the environment, generating transitions that are stored into a Replay Buffer. Then, the agent fetches transitions from the Replay Buffer to train its networks. After each update, an evaluator evaluates the agent's performance and saves the agent if the performance is good.
args
.env = PreprocessEnv()
: creates an environment (in the OpenAI gym format).agent = agent.XXX()
: creates an agent for a DRL algorithm.evaluator = Evaluator()
: evaluates and stores the trained model.buffer = ReplayBuffer()
: stores the transitions.agent.explore_env(…)
: the agent explores the environment within target steps, generates transitions, and stores them into the ReplayBuffer.agent.update_net(…)
: the agent uses a batch from the ReplayBuffer to update the network parameters.evaluator.evaluate_save(…)
: evaluates the agent's performance and keeps the trained model with the highest score.The while-loop will terminate when the conditions are met, e.g., achieving a target score, maximum steps, or manually breaks.
Results using ElegantRL
BipedalWalkerHardcore is a difficult task in continuous action space. There are only a few RL implementations can reach the target reward.
Check out a video on bilibili: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.
Necessary:
| Python 3.6+ |
| PyTorch 1.6+ |
Not necessary:
| Numpy 1.18+ | For ReplayBuffer. Numpy will be installed along with PyTorch.
| gym 0.17.0 | For env. Gym provides tutorial env for DRL training. (env.render() bug in gym==1.18 pyglet==1.6. Change to gym==1.17.0, pyglet==1.5)
| pybullet 2.7+ | For env. We use PyBullet (free) as an alternative of MuJoCo (not free).
| box2d-py 2.3.8 | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2 | For plots. Evaluate the agent performance.
pip3 install gym==1.17.0 pybullet Box2D matplotlib
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。