Treffer: 自然语言处理技术下文本信息语义抽取方法.
Weitere Informationen
The multitasking transformation and expansion appeal for a high demand for text information understanding and analysis ability. Actually, the scalability of complex text structures is poor and the annotated data are scarce. In view of this, a semantic extraction method for text information under natural language processing (NLP) technology is proposed. After strict cleaning and purification of the original text, entity position localization, entity neighboring word extraction, sequence length standardization, tokenization, and special tag addition preprocessing, the multi-layer bidirectional Transformer structure of the BERT model is used to map to a semantic word vector sequence, so as to effectively extract and represent semantic information and entity relationships in the text, and expand complex text structures. The BiGRU (bidirectional gated recurrent unit) model is used to process the vector sequence output by BERT, and a multi-head attention mechanism is introduced to calculate multiple attention weight sets concurrently, so as to capture the complex dependency relationships between words within the sentence. The output of the multi-head attention mechanism is classified by a Softmax classifier, and the relationship types between entities are annotated repeatedly, so as to achieve semantic information extraction in the text. The implementation results show that the quality of the text data processed by the proposed method has improved significantly. For the semantic extraction of text information, the F1 score is as high as 0.99. In addition, the various correlations between the inputs and outputs are described in detail, so that the complex dependence between words can be captured effectively, and the semantic extraction effect of text information is better. [ABSTRACT FROM AUTHOR]
在多任务变换和扩充下, 对文本信息理解和分析能力要求较高, 存在复杂文本结构可扩展性差、标注数据稀缺的 问题, 对此, 文中提出自然语言处理技术下文本信息语义抽取方法。对原始文本进行严格的清洗与净化、实体位置定位、实体 邻近词截取、序列长度标准化、Token(分词)化以及特殊标记添加等预处理后, 利用BERT模型的多层双向Transformer结构映射 为语义词向量序列, 有效提取和表示文本中的语义信息和实体关系, 扩展复杂文本结构。采用BiGRU(双向门控循环单元)模 型对BERT输出的向量序列进行处理后, 引入多头注意力机制, 并行计算多个注意力权重集合, 捕捉句子内部词与词之间的复 杂依赖关系, 通过Softmax分类器对多头注意力机制的输出进行分类, 反复标注实体之间的关系类型, 实现下文本信息的语义抽 取。实现结果表明: 经文中方法处理后的文本数据质量显著提升, 对于文本信息的语义抽取F1 高达0.99;且更细致地刻画了输 入与输出之间的多种相关性, 从而有效捕捉句子内部词与词之间的复杂依赖关系, 文本信息语义抽取效果较优。. [ABSTRACT FROM AUTHOR]
Copyright of Modern Electronic Technology is the property of Shaanxi Electronic Magazine Publishing Co., Ltd. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)