Active High-Recall Information Retrieval from Domain-Specific Text Corpora based on Query Documents

被引:8
|
作者
Chen, Sitong [1 ]
Mohd, Abidalrahman [1 ]
Nourashrafeddin, Seyednaser [1 ]
Milios, Evangelos [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, 6050 Univ Ave, Halifax, NS B3H 1W5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Recommendation Systems; Active Learning; Semantic Similarity; Information Retrieval;
D O I
10.1145/3209280.3209532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a high recall active document retrieval system for a class of applications involving query documents, as opposed to key terms, and domain-specific document corpora. The output of the model is a list of documents retrieved based on the domain expert feedback collected during training. A modified version of Bag of Word (BoW) representation and a semantic ranking module, based on Google n-grams, are used in the model. The core of the system is a binary document classification model which is trained through a continuous active learning strategy. In general, finding or constructing training data for this type of problem is very difficult due to either confidentiality of the data, or the need for domain expert time to label data. Our experimental results on the retrieval of Call For Papers based on a manuscript demonstrate the efficacy of the system to address this application and its performance compared to other candidate models.
引用
收藏
页数:10
相关论文
共 21 条
  • [1] High-Recall Information Retrieval from Linked Big Data
    Cuzzocrea, Alfredo
    Lee, Wookey
    Leung, Carson K.
    [J]. 39TH ANNUAL IEEE COMPUTERS, SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC 2015), VOL 2, 2015, : 712 - 717
  • [2] A concept-based information retrieval approach for engineering domain-specific technical documents
    Lin, Hsien-Tang
    Chi, Nai-Wen
    Hsieh, Shang-Hsien
    [J]. ADVANCED ENGINEERING INFORMATICS, 2012, 26 (02) : 349 - 360
  • [3] Mining ontological knowledge from domain-specific text documents
    Jiang, X
    Tan, AH
    [J]. FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 665 - 668
  • [4] Domain-Specific Semantic Retrieval of Institutional Repository Based on Query Extension
    Wu, Xu
    Li, Pengchong
    Xu, Jin
    Xie, Xiaqing
    [J]. TRUSTWORTHY COMPUTING AND SERVICES (ISCTCS 2014), 2015, 520 : 401 - 411
  • [5] Towards Question-based High-recall Information Retrieval: Locating the Last Few Relevant Documents for Technology-assisted Reviews
    Zou, Jie
    Kanoulas, Evangelos
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (03)
  • [6] Domain-specific information retrieval based on improved language model
    Kang, Kai
    Lin, Kunhui
    Zhou, Changle
    Guo, Feng
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 374 - +
  • [7] Analogy-based Matching Model for Domain-specific Information Retrieval
    Bounhas, Myriam
    Elayeb, Bilel
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 496 - 505
  • [8] Information Retrieval Approach based on Indexing Text Documents: Application to Biomedical Domain
    Boukhari, Kabil
    Omri, Mohamed Nazih
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 2213 - 2220
  • [9] An Active Learning Approach to Recognizing Domain-Specific Queries From Query Log
    Ni, Weijian
    Liu, Tong
    Sun, Haohao
    Wei, Zhensheng
    [J]. WEB AND BIG DATA, APWEB-WAIM 2017, PT II, 2017, 10367 : 18 - 32
  • [10] Mining Infrequent High-Quality Phrases from Domain-Specific Corpora
    Wang, Li
    Zhu, Wei
    Jiang, Sihang
    Zhang, Sheng
    Wang, Keqiang
    Ni, Yuan
    Xie, Guotong
    Xiao, Yanghua
    [J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 1535 - 1544