Treffer: Can LLM-Generated Misinformation Be Detected: A Study On Cyber Threat Intelligence

Title:

Can LLM-Generated Misinformation Be Detected: A Study On Cyber Threat Intelligence

Authors:

He, Huang, Sun, Nan, Tani Bertuol, Massimiliano, Zhang, Yu, Jiang, Jiaojiao, Jha, Sanjay

Source:

urn:ISSN:0167-739X ; urn:ISSN:1872-7115 ; Future generations computer systems, 173, 107877-107877

Publisher Information:

Elsevier

Publication Year:

2025

Collection:

UNSW Sydney (The University of New South Wales): UNSWorks

Subject Terms:

46 Information and Computing Sciences, 4608 Human-Centred Computing, anzsrc-for: 46 Information and Computing Sciences, anzsrc-for: 4608 Human-Centred Computing, anzsrc-for: 0803 Computer Software, anzsrc-for: 0805 Distributed Computing, anzsrc-for: 0806 Information Systems, anzsrc-for: 4605 Data management and data science, anzsrc-for: 4606 Distributed computing and systems software, anzsrc-for: 4609 Information systems

Document Type:

Fachzeitschrift article in journal/newspaper

File Description:

application/pdf

Language:

unknown

Relation:

https://www.sciencedirect.com/science/article/pii/S0167739X25001724; https://hdl.handle.net/1959.4/105054; https://doi.org/10.1016/j.future.2025.107877

DOI:

10.1016/j.future.2025.107877

Availability:

https://hdl.handle.net/1959.4/105054
https://unsworks.unsw.edu.au/bitstreams/1c9aa4e2-06f6-4aae-bb18-8bbe32a2c21c/download
https://doi.org/10.1016/j.future.2025.107877

Rights:

open access ; https://purl.org/coar/access_right/c_abf2 ; CC-BY ; https://creativecommons.org/licenses/by/4.0/ ; free_to_read

Accession Number:

edsbas.CFF780CD

Database:

BASE

Weitere Informationen

Given the increasing number and severity of cyber attacks, there has been a surge in cybersecurity information across various mediums such as posts, news articles, reports, and other resources. Cyber Threat Intelligence (CTI) involves processing data from these cybersecurity sources, enabling professionals and organizations to gain valuable insights. However, with the rapid dissemination of cybersecurity information, the inclusion of fake CTI can lead to severe consequences, including data poisoning attacks. To address this challenge, we have implemented a three-step strategy: generating synthetic CTI, evaluating the quality of the generated CTI, and detecting fake CTI. Unlike other subdomains, such as fake COVID news detection, there is currently no publicly available dataset specifically tailored for fake CTI detection research. To address this gap, we first establish a reliable groundtruth dataset by utilizing domain-specific cybersecurity data to fine-tune a Large Language Model (LLM) for synthetic CTI generation. We then employ crowdsourcing techniques and advanced synthetic data verification methods to evaluate the quality of the generated dataset, introducing a novel evaluation methodology that combines quantitative and qualitative approaches. Our comprehensive evaluation reveals that the generated CTI cannot be distinguished from genuine CTI by human annotators, regardless of their computer science background, demonstrating the effectiveness of our generation approach. We benchmark various misinformation detection techniques against our groundtruth dataset to establish baseline performance metrics for identifying fake CTI. By leveraging existing techniques and adapting them to the context of fake CTI detection, we provide a foundation for future research in this critical field. To facilitate further research, we make our code, dataset, and experimental results publicly available on GitHub.

Treffer: Can LLM-Generated Misinformation Be Detected: A Study On Cyber Threat Intelligence

Weitere Informationen

Links

Zusatz-Funktionen