Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are
popular deep reinforcement learning algorithms used for game AI in recent
years. A common understanding is that A2C and PPO are separate algorithms
because PPO's clipped objective appears significantly different than A2C's
objective. In this paper, however, we show A2C is a special case of PPO. We
present theoretical justifications and pseudocode analysis to demonstrate why.
To validate our claim, we conduct an empirical experiment using
\texttt{Stable-baselines3}, showing A2C and PPO produce the \textit{exact} same
models when other settings are controlled.
Metrics
22 Record Views
Details
Title
A2C is a special case of PPO
Creators
Shengyi Huang
Anssi Kanervisto
Antonin Raffin
Weixun Wang
Santiago Ontañón
Rousslan Fernand Julien Dossa
Publication Details
arXiv.org
Resource Type
Preprint
Language
English
Academic Unit
Computer Science (Computing)
Other Identifier
991021869112004721
Research Home Page
Browse by research and academic units
Learn about the ETD submission process at Drexel
Learn about the Libraries’ research data management services