Distributed Deep Reinforcement Learning (DRL) aims to leverage more
computational resources to train autonomous agents with less training time.
Despite recent progress in the field, reproducibility issues have not been
sufficiently explored. This paper first shows that the typical actor-learner
framework can have reproducibility issues even if hyperparameters are
controlled. We then introduce Cleanba, a new open-source platform for
distributed DRL that proposes a highly reproducible architecture. Cleanba
implements highly optimized distributed variants of PPO and IMPALA. Our Atari
experiments show that these variants can obtain equivalent or higher scores
than strong IMPALA baselines in moolib and torchbeast and PPO baseline in
CleanRL. However, Cleanba variants present 1) shorter training time and 2) more
reproducible learning curves in different hardware settings. Cleanba's source
code is available at \url{https://github.com/vwxyzjn/cleanba}
Metrics
8 Record Views
Details
Title
Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
Creators
Shengyi Huang
Jiayi Weng
Rujikorn Charakorn
Min Lin
Zhongwen Xu
Santiago Ontañón
Publication Details
arXiv.org
Resource Type
Preprint
Language
English
Academic Unit
Computer Science (Computing)
Other Identifier
991021869112604721
Research Home Page
Browse by research and academic units
Learn about the ETD submission process at Drexel
Learn about the Libraries’ research data management services