Vom 20.12.2025 bis 11.01.2026 ist die Universitätsbibliothek geschlossen. Ab dem 12.01.2026 gelten wieder die regulären Öffnungszeiten. Ausnahme: Medizinische Hauptbibliothek und Zentralbibliothek sind bereits ab 05.01.2026 wieder geöffnet. Weitere Informationen

Treffer: ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

Title:

ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

Authors:

Imajuku, Yuki, Horie, Kohki, Iwata, Yoichi, Aoki, Kensho, Takahashi, Naohiro, Akiba, Takuya

Publication Year:

2025

Subject Terms:

Artificial Intelligence

Document Type:

Report Working Paper

Access URL:

http://arxiv.org/abs/2506.09050

Accession Number:

edsarx.2506.09050

Database:

arXiv

Weitere Informationen

How well do AI systems perform in algorithm engineering for hard optimization problems in domains such as package-delivery routing, crew scheduling, factory production planning, and power-grid balancing? We introduce ALE-Bench, a new benchmark for evaluating AI systems on score-based algorithmic programming contests. Drawing on real tasks from the AtCoder Heuristic Contests, ALE-Bench presents optimization problems that are computationally hard and admit no known exact solution. Unlike short-duration, pass/fail coding benchmarks, ALE-Bench encourages iterative solution refinement over long time horizons. Our software framework supports interactive agent architectures that leverage test-run feedback and visualizations. Our evaluation of frontier LLMs revealed that while they demonstrate high performance on specific problems, a notable gap remains compared to humans in terms of consistency across problems and long-horizon problem-solving capabilities. This highlights the need for this benchmark to foster future AI advancements.
Accepted at NeurIPS 2025 Datasets & Benchmarks Track

Treffer: ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

Weitere Informationen

Links

Zusatz-Funktionen