An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

被引:0
|
作者
Ben Ayed, Alaidine [1 ]
Biskri, Ismail [2 ]
Meunier, Jean-Guy [3 ]
机构
[1] Univ Quebec Montreal, Cognit Comp Sci, Montreal, PQ, Canada
[2] Univ Quebec Trois Rivieres, Comp Sci Dept, Computat Linguist & Artificial Intelligence, Trois Rivieres, PQ, Canada
[3] Univ Quebec Montreal, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Data and Knowledge Representation; Document Retrieval; Internet and Web Applications; Mono/Multi-Document Summarization; RELEVANCE;
D O I
10.4018/IJIRR.289950
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of big data and the Industrial Revolution 4.0 era, enhancing document/information retrieval framework efficiency to handle the ever-growing volume of text data in an ever more digital world is a must. This article describes a double-stage system of document/information retrieval. First, a Lucene-based document retrieval tool is implemented, and a couple of query expansion techniques using a comparable corpus (Wikipedia) and word embeddings are proposed and tested. Second, a retention-fidelity summarization protocol is performed on top of the retrieved documents to create a short, accurate, and fluent extract of a longer retrieved single document (or a set of top retrieved documents). Obtained results show that using word embeddings is an excellent way to achieve higher precision rates and retrieve more accurate documents. Also, obtained summaries satisfy the retention and fidelity criteria of relevant summaries.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] End-to-End Unsupervised Document Image Blind Denoising
    Gangeh, Mehrdad J.
    Plata, Marcin
    Nezhad, Hamid R. Motahari
    Duffy, Nigel P.
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7868 - 7877
  • [42] DocEnTr: An End-to-End Document Image Enhancement Transformer
    Souibgui, Mohamed Ali
    Biswas, Sanket
    Jemni, Sana Khamekhem
    Kessentini, Yousri
    Fornes, Alicia
    Llados, Josep
    Pal, Umapada
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1699 - 1705
  • [43] An Open Architecture for End-to-End Document Analysis Benchmarking
    Lamiroy, Bart
    Lopresti, Daniel
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 42 - 47
  • [44] An End-to-End Image Retrieval System Based on Gravitational Field Deep Learning
    Zheng, Qinghe
    Yang, Mingqiang
    Zhang, Qingrui
    Zhang, Xinxin
    Yang, Jiajie
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS, ELECTRONICS AND CONTROL (ICCSEC), 2017, : 936 - 940
  • [45] Cost-effective End-to-end Information Extraction for Semi-structured Document Images
    Hwang, Wonseok
    Lee, Hyunji
    Yim, Jinyeong
    Kim, Geewook
    Seo, Minjoon
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3375 - 3383
  • [46] End-to-end simulation of the performance of Wales: Retrieval module
    Summa, D
    Di Girolamo, P
    Bauer, H
    Wulfmeyer, V
    22ND INTERNATIONAL LASER RADAR CONFERENCE (ILRC 2004), VOLS 1 AND 2, 2004, 561 : 1015 - 1018
  • [47] A Kalman-based Autoencoder Framework for End-to-End Communication Systems
    Hu, Bin
    Wang, Jian
    Xu, Chen
    Zhang, Gongzheng
    Li, Rong
    2021 IEEE 32ND ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2021,
  • [48] An end-to-end framework based on acoustic emission for welding penetration prediction
    Zhang, Yuxuan
    Chen, Bo
    Tan, Caiwang
    Song, Xiaoguo
    Zhao, Hongyun
    JOURNAL OF MANUFACTURING PROCESSES, 2023, 107 : 411 - 421
  • [49] Analytical Framework for End-to-End Delay Based on Unidirectional Highway Scenario
    Hassan, Aslinda
    Ahmed, Mohamed H.
    Rahman, M. A.
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [50] An End-to-End Framework for Clothing Collocation Based on Semantic Feature Fusion
    Zhao, Mingbo
    Liu, Yu
    Li, Xianrui
    Zhang, Zhao
    Zhang, Yue
    IEEE MULTIMEDIA, 2020, 27 (04) : 122 - 132