An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Rousslan Fernand Julien Dossa; Shengyi Huang; Santiago Ontanon; Takashi Matsubara

doi:10.1109/ACCESS.2021.3106662

Back

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Journal article

Open access

Peer reviewed

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Rousslan Fernand Julien Dossa, Shengyi Huang, Santiago Ontanon and Takashi Matsubara

IEEE access, v 9, pp 117981-117992

2021

DOI: https://doi.org/10.1109/ACCESS.2021.3106662

Files and links (2)

url

https://doi.org/10.1109/access.2021.3106662View

Published, Version of Record (VoR)CC BY-NC-ND V4.0, Open

url

https://doi.org/10.1109/ACCESS.2021.3106662View

Published, Version of Record (VoR) Open

Abstract

Artificial Intelligence

deep learning

Heuristic algorithms

Informatics

Licenses

Optimization

proximal policy optimization

Reinforcement learning

robot learning

robotics and automation

Task analysis

Tuning

Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as "early stopping" implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch ( K ), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) mitigate such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on K .

Metrics

10 Record Views

22 citations in Web of Science

18 citations in Scopus

Details

Title: An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization
Creators: Rousslan Fernand Julien Dossa - Kobe University
Shengyi Huang - Drexel University
Santiago Ontanon - Drexel University
Takashi Matsubara - Osaka University
Publication Details: IEEE access, v 9, pp 117981-117992
Publisher: IEEE
Grant note: JPMJMI20B8 / Japan Science and Technology Agency’s Mirai Program (JST-Mirai) (10.13039/501100002241)
Resource Type: Journal article
Language: English
Academic Unit: Computer Science
Web of Science ID: WOS:000761491600001
Scopus ID: 2-s2.0-85113849016
Other Identifier: 991019167736804721

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types: Domestic collaboration; International collaboration
Web of Science research areas: Computer Science, Information Systems; Engineering, Electrical & Electronic; Telecommunications

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Files and links (2)

Abstract

Metrics

Details

InCites Highlights

Drexel University Social media