Adaptation of MuZero General for continuous action space environments like MuJoCo and PyBullet.
- Multi-dimension continuous action space
- Fully connected network and Residual Network
Testing MuJoCo InvertedDoublePendulum-v2:
- MuJoCo InvertedPendulum-v2 (Tested with the fully connected network)
- MuJoCo InvertedDoublePendulum-v2 (Tested with the fully connected network)
- MuJoCo Swimmer-v2 (Tested with the fully connected network)
- MuJoCo Hopper-v2
- MuJoCo Walker2d-v2
- PyBullet InvertedPendulumBulletEnv-v0 (Tested with the fully connected network)
- PyBullet InvertedDoublePendulumBulletEnv-v0 (Tested with the fully connected network)
- PyBullet HopperBulletEnv-v0
Tests are done on Ubuntu with 16 GB RAM / Intel i7 / GTX 1050Ti Max-Q. We make sure to obtain a progression and a level which ensures that it has learned. But we do not systematically reach a human level. For certain environments, we notice a regression after a certain time. The proposed configurations are certainly not optimal and we do not focus for now on the optimization of hyperparameters. Any help is welcome.
git clone https://github.com/werner-duvaud/muzero-general.git
cd muzero-general
git checkout continuous
pip install -r requirements.txt
For MuJoCo environments, follow the instructions here for the installation.
python muzero.py
To visualize the training results, run in a new terminal:
tensorboard --logdir ./results
- Xuxi Yang
- Werner Duvaud
- Aurèle Hainaut
- Contributors
Please use this bibtex if you want to cite this repository (master branch) in your publications:
@misc{muzero-general,
author = {Werner Duvaud, Aurèle Hainaut},
title = {MuZero General: Open Reimplementation of MuZero},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/werner-duvaud/muzero-general}},
}
- GitHub Issues: For reporting bugs.
- Pull Requests: For submitting code contributions.
- Discord server: For discussions about development or any general questions.