Skip to main navigation Skip to search Skip to main content

Integrating Potential-Based Reward Shaping into AlphaZero

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

AlphaZero achieves superhuman performance through pure self-play without human expertise, but its dependence on sparse terminal rewards limits learning efficiency. This paper investigates integrating potential-based reward shaping into AlphaZero to accelerate learning while preserving optimality. We address whether reward shaping improves sample efficiency without compromising final performance, and which integration methods prove most effective. We present two implementation approaches: search-time shaping and auxiliary network heads, each targeting different components of the learning process. Experimental evaluation on Othello provides initial evidence of benefits, with ongoing work on comprehensive performance characterization across diverse environments.
Original languageEnglish
Title of host publicationESANN 2026 - Conference Proceedings
Publication statusAccepted/In press - 14 Jan 2026
EventEuropean Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 2026 - Crowne Plaza Bruges, Bruges, Belgium
Duration: 22 Apr 202624 Apr 2026
https://www.esann.org/

Conference

ConferenceEuropean Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 2026
Abbreviated titleESANN2026
Country/TerritoryBelgium
CityBruges
Period22/04/2624/04/26
Internet address

Keywords

  • reinforcement learning (RL)
  • reward shaping
  • alphazero
  • machine learning

Fingerprint

Dive into the research topics of 'Integrating Potential-Based Reward Shaping into AlphaZero'. Together they form a unique fingerprint.

Cite this