Beschrijving
AlphaZero achieves superhuman performance through pure self-play without human expertise, but its dependence on sparse terminal rewards limits learning efficiency. This paper investigates integrating potential-based reward shaping into AlphaZero to accelerate learning while preserving optimality. We address whether reward shaping improves sample efficiency without compromising final performance, and which integration methods prove most effective. We present two implementation approaches: search-time shaping and auxiliary network heads, each targeting different components of the learning process. Experimental evaluation on Othello provides initial evidence of benefits, with ongoing work on comprehensive performance characterization across diverse environments.| Periode | 24 apr. 2026 |
|---|---|
| Evenementstitel | European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 2026 |
| Evenementstype | Congres |
| Locatie | Bruges, BelgiëToon op kaart |
| Mate van erkenning | Internationaal |
Gerelateerde inhoud
-
Onderzoeksoutput
-
Integrating Potential-Based Reward Shaping into AlphaZero
Onderzoeksoutput: Hoofdstuk in Boek/Rapport/Congresprocedure › Conferentiebijdrage › peer review