Skip to main navigation Skip to search Skip to main content

On the Potential of LLMs for Offensive Security: Benchmarks vs. Operational Reality

  • Ruben Missotten
  • , Vera Rimmer
  • , Wim Mees
  • , Lieven Desmet
  • KU Leuven
  • Royal Military Academy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large Language Models (LLMs), through their strong capabilities in code generation, reasoning, and tool use, have demonstrated promising results in security tasks involving vulnerability discovery and exploitation. However, evaluating their offensive potential in automating penetration testing - a more complex and multi-stage process - remains a critical research challenge. While existing evaluation frameworks effectively demonstrate LLM capabilities in isolated or simplified scenarios, they often do not extend toward the complexity of interconnected attack chains characteristic of real-world adversarial operations. In this analytical study, we examine the challenge of assessing the feasibility of LLM-powered automation across the full adversarial pipeline within realistic environments. We contribute an analysis of current benchmarks and associated environments, and highlight opportunities for methodological enhancements that would strengthen alignment between academic evaluations and operational realities.

Original languageEnglish
Title of host publicationProceedings - 2025 Annual Computer Security Applications Conference Workshops, ACSACW 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages420-427
Number of pages8
ISBN (Electronic)9798331545369
DOIs
Publication statusPublished - 2025
Event2025 Annual Computer Security Applications Conference Workshops, ACSACW 2025 - Honolulu, United States
Duration: 8 Dec 202512 Dec 2025

Publication series

NameProceedings - 2025 Annual Computer Security Applications Conference Workshops, ACSACW 2025

Conference

Conference2025 Annual Computer Security Applications Conference Workshops, ACSACW 2025
Country/TerritoryUnited States
CityHonolulu
Period8/12/2512/12/25

Keywords

  • Benchmark
  • Cyber Kill Chain
  • LLM
  • MITRE ATT&CK
  • offensive security
  • penetration testing
  • red teaming

Fingerprint

Dive into the research topics of 'On the Potential of LLMs for Offensive Security: Benchmarks vs. Operational Reality'. Together they form a unique fingerprint.

Cite this