An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

被引:0
|
作者
Ben Ayed, Alaidine [1 ]
Biskri, Ismail [2 ]
Meunier, Jean-Guy [3 ]
机构
[1] Univ Quebec Montreal, Cognit Comp Sci, Montreal, PQ, Canada
[2] Univ Quebec Trois Rivieres, Comp Sci Dept, Computat Linguist & Artificial Intelligence, Trois Rivieres, PQ, Canada
[3] Univ Quebec Montreal, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Data and Knowledge Representation; Document Retrieval; Internet and Web Applications; Mono/Multi-Document Summarization; RELEVANCE;
D O I
10.4018/IJIRR.289950
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of big data and the Industrial Revolution 4.0 era, enhancing document/information retrieval framework efficiency to handle the ever-growing volume of text data in an ever more digital world is a must. This article describes a double-stage system of document/information retrieval. First, a Lucene-based document retrieval tool is implemented, and a couple of query expansion techniques using a comparable corpus (Wikipedia) and word embeddings are proposed and tested. Second, a retention-fidelity summarization protocol is performed on top of the retrieved documents to create a short, accurate, and fluent extract of a longer retrieved single document (or a set of top retrieved documents). Obtained results show that using word embeddings is an excellent way to achieve higher precision rates and retrieve more accurate documents. Also, obtained summaries satisfy the retention and fidelity criteria of relevant summaries.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] A Framework for Evaluating the End-to-End Trustworthiness
    Mohammadi, Nazila Gol
    Bandyszak, Torsten
    Weyer, Thorsten
    Kalogiros, Costas
    Kanakakis, Michalis
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 1, 2015, : 638 - 645
  • [32] Efficient and Robust LiDAR-Based End-to-End Navigation
    Liu, Zhijian
    Amini, Alexander
    Zhu, Sibo
    Karaman, Sertac
    Han, Song
    Rus, Daniela L.
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13247 - 13254
  • [33] An efficient end-to-end feature based system for SAR ATR
    Pham, QH
    Brosnan, TM
    Smith, MJT
    Mersereau, RM
    ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY V, 1998, 3370 : 519 - 529
  • [34] Fast overlay tree based on efficient end-to-end measurements
    Jin, X
    Wang, YJ
    Chan, SHG
    ICC 2005: IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-5, 2005, : 1319 - 1323
  • [35] END-TO-END ENERGY EFFICIENT COMMUNICATION
    Dittmann, Lars
    PROCEEDINGS OF 2011 INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY AND APPLICATION, ICCTA2011, 2011, : 323 - 327
  • [36] An End-to-End Framework for Evaluating Explainable Deep Models: Application to Historical Document Image Segmentation
    Brini, Iheb
    Mehri, Maroua
    Ingold, Rolf
    Ben Amara, Najoua Essoukri
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 106 - 119
  • [37] A Historical Document Handwriting Transcription End-to-end System
    Romero, Veronica
    Bosch, Vicente
    Hernandez, Celio
    Vidal, Enrique
    Andreu Sanchez, Joan
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 149 - 157
  • [38] An End-to-End Security Approach for Digital Document Management
    Diego Munoz-Hernandez, Mario
    Morales-Sandoval, Miguel
    Juan Garcia-Hernandez, Jose
    COMPUTER JOURNAL, 2016, 59 (07): : 1076 - 1090
  • [39] DLAFormer: An End-to-End Transformer For Document Layout Analysis
    Wang, Jiawei
    Hu, Kai
    Huo, Qiang
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT IV, 2024, 14807 : 40 - 57
  • [40] An End-to-End Preprocessor Based on Adversiarial Learning for Mongolian Historical Document OCR
    Su, Xiangdong
    Xu, Huali
    Zhang, Yue
    Kang, Yanke
    Gao, Guanglai
    Batusiren
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 266 - 272