Treffer: Concurrency Bug Detection via Static Analysis and Large Language Models.

Title:
Concurrency Bug Detection via Static Analysis and Large Language Models.
Authors:
Feng, Zuocheng1 (AUTHOR), Chen, Yiming1,2 (AUTHOR), Zhang, Kaiwen1 (AUTHOR), Li, Xiaofeng2 (AUTHOR), Liu, Guanjun1 (AUTHOR) liuguanjun@tongji.edu.cn
Source:
Future Internet. Dec2025, Vol. 17 Issue 12, p578. 27p.
Database:
Library, Information Science & Technology Abstracts

Weitere Informationen

Concurrency bugs originate from complex and improper synchronization of shared resources, presenting a significant challenge for detection. Traditional static analysis relies heavily on expert knowledge and frequently fails when code is non-compilable. Conversely, large language models struggle with semantic sparsity, inadequate comprehension of concurrent semantics, and the tendency to hallucinate. To address the limitations of static analysis in capturing complex concurrency semantics and the hallucination risks associated with large language models, this study proposes ConSynergy. This novel framework integrates the structural rigor of static analysis with the semantic reasoning capabilities of large language models. The core design employs a robust task decomposition strategy that decomposes concurrency bug detection into a four-stage pipeline: shared resource identification, concurrency-aware slicing, data-flow reasoning, and formal verification. This approach fundamentally mitigates hallucinations from large language models caused by insufficient program context. First, the framework identifies shared resources and applies a concurrency-aware program slicing technique to precisely extract concurrency-related structural features, thereby alleviating semantic sparsity. Second, to enhance the large language model's comprehension of concurrent semantics, we design a concurrency data-flow analysis based on Chain-of-Thought prompting. Third, the framework incorporates a Satisfiability Modulo Theories solver to ensure the reliability of detection results, alongside an iterative repair mechanism based on large language models that dramatically reduces dependency on code compilability. Extensive experiments on three mainstream concurrency bug datasets, including DataRaceBench, the concurrency subset of Juliet, and DeepRace, demonstrate that ConSynergy achieves an average precision and recall of 80.0% and 87.1%, respectively. ConSynergy outperforms state-of-the-art baselines by 10.9% to 68.2% in average F1 score, demonstrating significant potential for practical application. [ABSTRACT FROM AUTHOR]