Vom 20.12.2025 bis 11.01.2026 ist die Universitätsbibliothek geschlossen. Ab dem 12.01.2026 gelten wieder die regulären Öffnungszeiten. Ausnahme: Medizinische Hauptbibliothek und Zentralbibliothek sind bereits ab 05.01.2026 wieder geöffnet. Weitere Informationen

Treffer: Model Checking Using Large Language Models—Evaluation and Future Directions.

Title:
Model Checking Using Large Language Models—Evaluation and Future Directions.
Source:
Electronics (2079-9292); Jan2025, Vol. 14 Issue 2, p401, 34p
Database:
Complementary Index

Weitere Informationen

Large language models (LLMs) such as ChatGPT have risen in prominence recently, leading to the need to analyze their strengths and limitations for various tasks. The objective of this work was to evaluate the performance of large language models for model checking, which is used extensively in various critical tasks such as software and hardware verification. A set of problems were proposed as a benchmark in this work and three LLMs (GPT-4, Claude, and Gemini) were evaluated with respect to their ability to solve these problems. The evaluation was conducted by comparing the responses of the three LLMs with the gold standard provided by model checking tools. The results illustrate the limitations of LLMs in these tasks, identifying directions for future research. Specifically, the best overall performance (ratio of problems solved correctly) was 60%, indicating a high probability of reasoning errors by the LLMs, especially when dealing with more complex scenarios requiring many reasoning steps, and the LLMs typically performed better when generating scripts for solving the problems rather than solving them directly. [ABSTRACT FROM AUTHOR]

Copyright of Electronics (2079-9292) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)