A2C is a special case of PPO

Shengyi Huang; Anssi Kanervisto; Antonin Raffin; Weixun Wang; Santiago Ontañón; Rousslan Fernand Julien Dossa

doi:10.48550/arxiv.2205.09123

Back

Preprint

Open access

A2C is a special case of PPO

Shengyi Huang, Anssi Kanervisto, Antonin Raffin, Weixun Wang, Santiago Ontañón and Rousslan Fernand Julien Dossa

arXiv.org

18 May 2022

DOI: https://doi.org/10.48550/arxiv.2205.09123

Files and links (1)

url

https://doi.org/10.48550/arxiv.2205.09123View

Preprint (Author's original)arXiv.org - Non-exclusive license to distribute, Open

Abstract

Computer Science - Learning

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using \texttt{Stable-baselines3}, showing A2C and PPO produce the \textit{exact} same models when other settings are controlled.

Metrics

22 Record Views

Details

Title: A2C is a special case of PPO
Creators: Shengyi Huang
Anssi Kanervisto
Antonin Raffin
Weixun Wang
Santiago Ontañón
Rousslan Fernand Julien Dossa
Publication Details: arXiv.org
Resource Type: Preprint
Language: English
Academic Unit: Computer Science (Computing)
Other Identifier: 991021869112004721

A2C is a special case of PPO

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media