DI-engine
/
CHANGELOG

2022.04.23(v0.3.1)
- env: polish and standardize dizoo config (#252) (#255) (#249) (#246) (#262) (#261) (#266) (#273) (#263) (#280) (#259) (#286) (#277) (#290) (#289) (#299)
- env: add GRF academic env and config (#281)
- env: update env inferface of GRF (#258)
- env: update D4RL offline RL env and config (#285)
- env: polish PomdpAtariEnv (#254)
- algo: DREX algorithm (#218)
- feature: separate mq and parallel modules, add redis (#247)
- feature: rename env variables; fix attach_to parameter (#244)
- feature: env implementation check (#275)
- feature: adjust and set the max column number of tabulate in log (#296)
- feature: add drop_extra option for sample collect
- feature: speed up GTrXL forward method + GRU unittest (#253) (#292)
- fix: add act_scale in DingEnvWrapper; fix envpool env manager (#245)
- fix: auto_reset=False and env_ref bug in env manager (#248)
- fix: data type and deepcopy bug in RND (#288)
- fix: share_memory bug and multi_mujoco env (#279)
- fix: some bugs in GTrXL (#276)
- fix: update gym_vector_env_manager and add more unittest (#241)
- fix: mdpolicy random collect bug (#293)
- fix: gym.wrapper save video replay bug
- fix: collect abnormal step format bug and add unittest
- test: add buffer benchmark & socket test (#284)
- style: upgrade mpire (#251)
- style: add GRF(google research football) docker (#256)
- style: update policy and gail comment

2022.03.24(v0.3.0)
- env: add bitfilp HER DQN benchmark (#192) (#193) (#197)
- env: slime volley league training demo (#229)
- algo: Gated TransformXL (GTrXL) algorithm (#136)
- algo: TD3 + VAE(HyAR) latent action algorithm (#152)
- algo: stochastic dueling network (#234)
- algo: use log prob instead of using prob in ACER (#186)
- feature: support envpool env manager (#228)
- feature: add league main and other improvements in new framework (#177) (#214)
- feature: add pace controller middleware in new framework (#198)
- feature: add auto recover option in new framework (#242)
- feature: add k8s parser in new framework (#243)
- feature: support async event handler and logger (#213)
- feautre: add grad norm calculator (#205)
- feautre: add gym vector env manager (#147)
- feautre: add train_iter and env_step in serial pipeline (#212)
- feautre: add rich logger handler (#219) (#223) (#232)
- feature: add naive lr_scheduler demo
- refactor: new BaseEnv and DingEnvWrapper (#171) (#231) (#240)
- polish: MAPPO and MASAC smac config (#209) (#239)
- polish: QMIX smac config (#175)
- polish: R2D2 atari config (#181)
- polish: A2C atari config (#189)
- polish: GAIL box2d and mujoco config (#188)
- polish: ACER atari config (#180)
- polish: SQIL atari config (#230)
- polish: TREX atari/mujoco config
- polish: IMPALA atari config
- polish: MBPO/D4PG mujoco config
- fix: random_collect compatible to episode collector (#190)
- fix: remove default n_sample/n_episode value in policy config (#185)
- fix: PDQN model bug on gpu device (#220)
- fix: TREX algorithm CLI bug (#182)
- fix: DQfD JE computation bug and move to AdamW optimizer (#191)
- fix: pytest problem for parallel middleware (#211)
- fix: mujoco numpy compatibility bug
- fix: markupsafe 2.1.0 bug
- fix: framework parallel module network emit bug
- fix: mpire bug and disable algotest in py3.8
- fix: lunarlander env import and env_id bug
- fix: icm unittest repeat name bug
- fix: buffer thruput close bug
- test: resnet unittest (#199)
- test: SAC/SQN unittest (#207)
- test: CQL/R2D3/GAIL unittest (#201)
- test: NGU td unittest (#210)
- test: model wrapper unittest (#215)
- test: MAQAC model unittest (#226)
- style: add doc docker (#221)

2022.01.01(v0.2.3)
- env: add multi-agent mujoco env (#146)
- env: add delay reward mujoco env (#145)
- env: fix port conflict in gym_soccer (#139)
- algo: MASAC algorithm (#112)
- algo: TREX algorithm (#119) (#144)
- algo: H-PPO hybrid action space algorithm (#140)
- algo: residual link in R2D2 (#150)
- algo: gumbel softmax (#169)
- algo: move actor_head_type to action_space field
- feature: new main pipeline and async/parallel framework (#142) (#166) (#168)
- feature: refactor buffer, separate algorithm and storage (#129)
- feature: cli in new pipeline(ditask) (#160)
- feature: add multiprocess tblogger, fix circular reference problem (#156)
- feature: add multiple seed cli
- feature: polish eps_greedy_multinomial_sample in model_wrapper (#154)
- fix: R2D3 abs priority problem (#158) (#161)
- fix: multi-discrete action space policies random action bug (#167)
- fix: doc generate bug with enum_tools (#155)
- style: more comments about R2D2 (#149)
- style: add doc about how to migrate a new env
- style: add doc about env tutorial in dizoo
- style: add conda auto release (#148)
- style: udpate zh doc link
- style: update kaggle tutorial link

2021.12.03(v0.2.2)
- env: apple key to door treasure env (#128)
- env: add bsuite memory benchmark (#138)
- env: polish atari impala config
- algo: Guided Cost IRL algorithm (#57)
- algo: ICM exploration algorithm (#41)
- algo: MP-DQN hybrid action space algorithm (#131)
- algo: add loss statistics and polish r2d3 pong config (#126)
- feautre: add renew env mechanism in env manager and update timeout mechanism (#127) (#134)
- fix: async subprocess env manager reset bug (#137)
- fix: keepdims name bug in model wrapper
- fix: on-policy ppo value norm bug
- fix: GAE and RND unittest bug
- fix: hidden state wrapper h tensor compatiblity
- fix: naive buffer auto config create bug
- style: add supporters list

2021.11.22(v0.2.1)
- env: gym-hybrid env (#86)
- env: gym-soccer (HFO) env (#94)
- env: Go-Bigger env baseline (#95)
- env: add the bipedalwalker config of sac and ppo (#121)
- algo: DQfD Imitation Learning algorithm (#48) (#98)
- algo: TD3BC offline RL algorithm (#88)
- algo: MBPO model-based RL algorithm (#113)
- algo: PADDPG hybrid action space algorithm (#109)
- algo: PDQN hybrid action space algorithm (#118)
- algo: fix R2D2 bugs and produce benchmark, add naive NGU (#40)
- algo: self-play training demo in slime_volley env (#23)
- algo: add example of GAIL entry + config for mujoco (#114)
- feature: enable arbitrary policy num in serial sample collector
- feautre: add torch DataParallel for single machine multi-GPU
- feature: add registry force_overwrite argument
- feature: add naive buffer periodic thruput seconds argument
- test: add pure docker setting test (#103)
- test: add unittest for dataset and evaluator (#107)
- test: add unittest for on-policy algorithm (#92)
- test: add unittest for ppo and td (MARL case) (#89)
- test: polish collector benchmark test
- fix: target model wrapper hard reset bug
- fix: fix learn state_dict target model bug
- fix: ppo bugs and update atari ppo offpolicy config (#108)
- fix: pyyaml version bug (#99)
- fix: small fix on bsuite environment (#117)
- fix: discrete cql unittest bug
- fix: release workflow bug
- fix: base policy model state_dict overlap bug
- fix: remove on_policy option in dizoo config and entry
- fix: remove torch in env
- style: gym version > 0.20.0
- style: torch version >= 1.1.0, <= 1.10.0
- style: ale-py == 0.7.0

2021.9.30(v0.2.0)
- env: overcooked env (#20)
- env: procgen env (#26)
- env: modified predator env (#30)
- env: d4rl env (#37)
- env: imagenet dataset (#27)
- env: bsuite env (#58)
- env: move atari_py to ale-py
- algo: SQIL algorithm (#25) (#44)
- algo: CQL algorithm (discrete/continuous) (#37) (#68)
- algo: MAPPO algorithm (#62)
- algo: WQMIX algorithm (#24)
- algo: D4PG algorithm (#76)
- algo: update multi discrete policy(dqn, ppo, rainbow) (#51) (#72)
- feature: image classification training pipeline (#27)
- feature: add force_reproducibility option in subprocess env manager
- feature: add/delete/restart replicas via cli for k8s
- feautre: add league metric (trueskill and elo) (#22)
- feature: add tb in naive buffer and modify tb in advanced buffer (#39)
- feature: add k8s launcher and di-orchestrator launcher, add related unittest (#45) (#49)
- feature: add hyper-parameter scheduler module (#38)
- feautre: add plot function (#59)
- fix: acer bug and update atari result (#21)
- fix: mappo nan bug and dict obs cannot unsqueeze bug (#54)
- fix: r2d2 hidden state and obs arange bug (#36) (#52)
- fix: ppo bug when use dual_clip and adv > 0
- fix: qmix double_q hidden state bug
- fix: spawn context problem in interaction unittest (#69)
- fix: formatted config no eval bug (#53)
- fix: the catch statments that will never succeed and system proxy bug (#71) (#79)
- fix: lunarlander config
- fix: c51 head dimension mismatch bug
- fix: mujoco config typo bug
- fix: ppg atari config bug
- fix: max use and priority update special branch bug in advanced_buffer
- style: add docker deploy in github workflow (#70) (#78) (#80)
- style: support PyTorch 1.9.0
- style: add algo/env list in README
- style: rename advanced_buffer register name to advanced


2021.8.3(v0.1.1)
- env: selfplay/league demo (#12)
- env: pybullet env (#16)
- env: minigrid env (#13)
- env: atari enduro config (#11)
- algo: on policy PPO (#9)
- algo: ACER algorithm (#14)
- feature: polish experiment directory structure (#10)
- refactor: split doc to new repo (#4)
- fix: atari env info action space bug
- fix: env manager retry wrapper raise exception info bug
- fix: dist entry disable-flask-log typo
- style: codestyle optimization by lgtm (#7)
- style: code/comment statistics badge
- style: github CI workflow

2021.7.8(v0.1.0)