Treffer: Can LLM-Generated Misinformation Be Detected: A Study On Cyber Threat Intelligence
Weitere Informationen
Given the increasing number and severity of cyber attacks, there has been a surge in cybersecurity information across various mediums such as posts, news articles, reports, and other resources. Cyber Threat Intelligence (CTI) involves processing data from these cybersecurity sources, enabling professionals and organizations to gain valuable insights. However, with the rapid dissemination of cybersecurity information, the inclusion of fake CTI can lead to severe consequences, including data poisoning attacks. To address this challenge, we have implemented a three-step strategy: generating synthetic CTI, evaluating the quality of the generated CTI, and detecting fake CTI. Unlike other subdomains, such as fake COVID news detection, there is currently no publicly available dataset specifically tailored for fake CTI detection research. To address this gap, we first establish a reliable groundtruth dataset by utilizing domain-specific cybersecurity data to fine-tune a Large Language Model (LLM) for synthetic CTI generation. We then employ crowdsourcing techniques and advanced synthetic data verification methods to evaluate the quality of the generated dataset, introducing a novel evaluation methodology that combines quantitative and qualitative approaches. Our comprehensive evaluation reveals that the generated CTI cannot be distinguished from genuine CTI by human annotators, regardless of their computer science background, demonstrating the effectiveness of our generation approach. We benchmark various misinformation detection techniques against our groundtruth dataset to establish baseline performance metrics for identifying fake CTI. By leveraging existing techniques and adapting them to the context of fake CTI detection, we provide a foundation for future research in this critical field. To facilitate further research, we make our code, dataset, and experimental results publicly available on GitHub.