Treffer: 基于生成式 LLM 的开源情报分析方法.
Weitere Informationen
The authors propose a method integrating generative large language models (LLMs),XPath,and retrieval-augmented generation(RAG) for web page information extraction in open-source intelligence analysis. Key innovations include a dynamic templated prompting strategy and multi-granularity semantic retrieval. The dynamic templates generate domain-constrained prompts based on intelligence types ( events/ persons/ organizations), enhancing entity extraction accuracy. The multi-granular retrieval establishes a documentparagraph-entity hierarchy optimized by the BERT-Topk algorithm for fragmented long-text information. By aligning entities with OpenKG,a three-dimensional attribute-relation-event network is constructed to strengthen complex event analysis. Experiments on ClueWeb22 and TAC-KBP2022 datasets show the extraction rate is 0. 85 and the response accuracy is 0. 78,outperforming traditional RAG by 18% ~31%. In practical applications,92% key fact accuracy is achieved in event briefings with a total cost of only 12% of GPT-4. [ABSTRACT FROM AUTHOR]
针对开源情报分析中网页信息提取问答问题, 提出一种融合生成式大语言模型 (Large Language Model, LM) 、XPath 与检索增强生成 (Retrieval-Augmented Generation, RAG) 的方法, 涉及动 态模板化提示策略与多粒度语义检索. 动态模板基于情报类型生成领域知识约束提示, 提升实体提 取精度; 多粒度检索构建文档-段落-实体三级体系, 结合 BERT-Topk 算法优化长文本信息定位. 通 过 OpenKG 知识库对齐实体构建属性-关系-事件三维网络, 增强复杂事件逻辑分析. 该方法在 ClueWeb22 与 TAC-KBP2022 数据集上的提取率为 0. 85,回答准确率为 0. 78,相比传统 RAG, 性能提 升 18% ~31%. 实际应用中, 热点事件简报关键事实准确率达 92%,综合成本仅为 GPT-4 的 12%. [ABSTRACT FROM AUTHOR]
Copyright of Telecommunication Engineering is the property of Telecommunication Engineering and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)