Doorgaan naar hoofdnavigatie Doorgaan naar zoeken Ga verder naar hoofdinhoud

Learning Robust Penetration Testing Policies under Partial Observability: A systematic evaluation

  • VUB University

Onderzoeksoutput: Bijdrage aan een tijdschriftArtikelpeer review

Samenvatting

Penetration testing, the simulation of cyberattacks to identify security vulnerabilities, presents a sequential decision-making problem well-suited for reinforcement learning (RL) automation. Like many applications of RL to real-world problems, partial observability presents a major challenge, as it invalidates the Markov property present in Markov Decision Processes (MDPs). Partially Observable MDPs require history aggregation or belief state estimation to learn successful policies. We investigate stochastic, partially observable penetration testing scenarios over host networks of varying size, aiming to better reflect real-world complexity through more challenging and representative benchmarks. This approach leads to the development of more robust and transferable policies, which are crucial for ensuring reliable performance across diverse and unpredictable real-world environments. Using vanilla Proximal Policy Optimization (PPO) as a baseline, we compare a selection of PPO-based variants designed to mitigate partial observability, including frame-stacking, augmenting observations with historical information, and employing LSTM or TrXL architectures. We conduct a systematic empirical analysis of these algorithms across different host network sizes. We find that this task greatly benefits from history aggregation. Converging up to four times faster than other approaches. Manual inspection of the learned policies by the algorithms reveals clear distinctions and provides insights that go beyond quantitative results.

Originele taal-2Engels
TijdschriftTransactions on Machine Learning Research
Volume2026-April
StatusGepubliceerd - 2026

Vingerafdruk

Duik in de onderzoeksthema's van 'Learning Robust Penetration Testing Policies under Partial Observability: A systematic evaluation'. Samen vormen ze een unieke vingerafdruk.

Citeer dit