DS4A: Deep Search System for Algorithms from Full-text Scholarly Big Data

被引:9
|
作者
Safder, Iqra [1 ]
Saeed-Ul Hassan [1 ]
机构
[1] Informat Technol Univ, Lahore, Pakistan
关键词
Algorithm search; Information retrieval; Full text; Deep teaming; Bi-directional LSTM;
D O I
10.1109/ICDMW.2018.00186
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While information retrieval systems have shown tremendous improvements in searching for relevant scientific literature, there is still a gap to cater users' ever demanding need to search for specific metadata-related information from full text publications. In this paper, we present a deep learning based system that enhances the capability of search mechanisms by classifying algorithm-specific metadata, such as accuracy, precision and recall, and further details such as the datasets that they operate on and the time complexity- from full-text publications. Specifically, in contrast to traditional term frequency-inverse document frequency (TF-IDF) based approach that uses frequent terms as in 'bag of words' models, we first generated a synopsis of the full-text document and then enriched it with sentences that classify as algorithm-specific metadata from full-text to improve algorithmic-specific searching capabilities. These sentences were classified from deep learning based bi-directional long short term memory network (LSTM) model. Our bi-directional LSTM model outperformed Support Vector Machine (SVM) by 9.46 % with 0.81 F-measure in classifying 37,000 algorithm-specific metadata lines, annotated by four human experts. Finally, we present a case study on 21,940 full-text publications downloaded from the full text repository of the ACL (https://aclweb.org/) to show the advantages of a deep learning based advanced searching system over conventional TF-IDFbased (Lucene) text-retrieval systems.
引用
收藏
页码:1308 / 1315
页数:8
相关论文
共 48 条
  • [1] Big Data Full-Text Search Index Minimization Using Text Summarization
    Iqbal, Waheed
    Malik, Waqas Ilyas
    Bukhari, Faisal
    Almustafa, Khaled Mohamad
    Nawaz, Zubiar
    [J]. INFORMATION TECHNOLOGY AND CONTROL, 2021, 50 (02): : 375 - 389
  • [2] Algorithms of marine propulsion multimedia full-text system
    Ren Zhen
    [J]. Proceedings of the China Association for Science and Technology, Vol 2, No 1, 2006, : 147 - 150
  • [3] DIFTSAS: a DIstributed Full Text Search and Analysis System for Big Data
    Li, Bo
    Zhang, Jingjie
    Chen, Mingyu
    Zhang, JinChao
    Wang, Kunpeng
    Meng, Dan
    [J]. 2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 1303 - 1309
  • [4] Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents
    Safder, Iqra
    Hassan, Saeed-Ul
    Visvizi, Anna
    Noraset, Thanapon
    Nawaz, Raheel
    Tuarob, Suppawong
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
  • [5] Full-text search engine with suffix index for massive heterogeneous data
    Xu, Wentao
    Chen, Haoyu
    Huan, Yidong
    Hu, Xuedong
    Nong, Ge
    [J]. INFORMATION SYSTEMS, 2022, 104
  • [6] Hardware Accelerator for Full-Text Search (HAFTS) with Succinct Data Structure
    Tanida, Naoki
    Inaba, Mary
    Hiraki, Kei
    Yoshino, Takeshi
    [J]. 2009 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS, 2009, : 155 - +
  • [7] Deep context of citations using machine-learning models in scholarly full-text articles
    Saeed-Ul Hassan
    Mubashir Imran
    Sehrish Iqbal
    Naif Radi Aljohani
    Raheel Nawaz
    [J]. Scientometrics, 2018, 117 : 1645 - 1662
  • [8] Enhancing HDFS with a full-text search system for massive small files
    Xu, Wentao
    Zhao, Xin
    Lao, Bin
    Nong, Ge
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (07): : 7149 - 7170
  • [9] Deep context of citations using machine-learning models in scholarly full-text articles
    Hassan, Saeed-Ul
    Imran, Mubashir
    Iqbal, Sehrish
    Aljohani, Naif Radi
    Nawaz, Raheel
    [J]. SCIENTOMETRICS, 2018, 117 (03) : 1645 - 1662
  • [10] Enhancing HDFS with a full-text search system for massive small files
    Wentao Xu
    Xin Zhao
    Bin Lao
    Ge Nong
    [J]. The Journal of Supercomputing, 2021, 77 : 7149 - 7170