Treffer: Preprocessing framework for scholarly big data management.
Weitere Informationen
Big data technologies have found applications in disparate domains. One of the largest sources of textual big data is scientific documents and papers. Scholarly big data has been used in numerous ways to develop innovative applications such as collaborator discovery, expert finding and research management systems. With the evolution of machine and deep learning techniques, the efficacy of such applications has risen manifold. However, the biggest challenge in the development of deep learning models for scholarly applications in cloud-based environment is the under-utilization of resources because of the excessive time required for textual preprocessing. This paper presents a preprocessing pipeline that uses Spark for data ingestion and Spark ML for performing preprocessing tasks. The proposed approach is evaluated with the help of a case study, which uses LSTM-based text summarization to generate title or summaries from abstracts of scholarly articles. Results indicate a substantial reduction in ingestion, preprocessing and cumulative time for the proposed approach, which shall manifest reduction in development time and costs as well. [ABSTRACT FROM AUTHOR]
Copyright of Multimedia Tools & Applications is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)