A systematic review and comparative analysis of cross-document coreference resolution methods and tools

被引:24
|
作者
Beheshti, Seyed-Mehdi-Reza [1 ]
Benatallah, Boualem [1 ]
Venugopal, Srikumar [1 ]
Ryu, Seung Hwan [1 ]
Motahari-Nezhad, Hamid Reza [2 ]
Wang, Wei [1 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
[2] IBM Almaden Res Ctr, San Jose, CA USA
关键词
Information extraction; Cross-document coreference Resolution; Large datasets;
D O I
10.1007/s00607-016-0490-0
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Information extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among various IE tasks, extracting actionable intelligence from an ever-increasing amount of data depends critically upon cross-document coreference resolution (CDCR) - the task of identifying entity mentions across information sources that refer to the same underlying entity. CDCR is the basis of knowledge acquisition and is at the heart of Web search, recommendations, and analytics. Real time processing of CDCR processes is very important and have various applications in discovering must-know information in real-time for clients in finance, public sector, news, and crisis management. Being an emerging area of research and practice, the reported literature on CDCR challenges and solutions is growing fast but is scattered due to the large space, various applications, and large datasets of the order of peta-/tera-bytes. In order to fill this gap, we provide a systematic review of the state of the art of challenges and solutions for a CDCR process. We identify a set of quality attributes, that have been frequently reported in the context of CDCR processes, to be used as a guide to identify important and outstanding issues for further investigations. Finally, we assess existing tools and techniques for CDCR subtasks and provide guidance on selection of tools and algorithms.
引用
收藏
页码:313 / 349
页数:37
相关论文
共 50 条
  • [1] A systematic review and comparative analysis of cross-document coreference resolution methods and tools
    Seyed-Mehdi-Reza Beheshti
    Boualem Benatallah
    Srikumar Venugopal
    Seung Hwan Ryu
    Hamid Reza Motahari-Nezhad
    Wei Wang
    [J]. Computing, 2017, 99 : 313 - 349
  • [2] An Overview of Cross-Document Coreference Resolution
    Keshtkaran, Aliakbar
    Yuhaniz, Siti Sophiayati
    Ibrahim, Suhaimi
    [J]. 2017 FIRST INTERNATIONAL CONFERENCE ON COMPUTER AND DRONE APPLICATIONS (ICONDA), 2017, : 43 - 48
  • [3] The Challenges of Cross-Document Coreference Resolution in Email
    Li, Xue
    Magliacane, Sara
    Groth, Paul
    [J]. PROCEEDINGS OF THE 11TH KNOWLEDGE CAPTURE CONFERENCE (K-CAP '21), 2021, : 273 - 276
  • [4] XCoref: Cross-document Coreference Resolution in the Wild
    Zhukova, Anastasia
    Hamborg, Felix
    Donnay, Karsten
    Gipp, Bela
    [J]. INFORMATION FOR A BETTER WORLD: SHAPING THE GLOBAL FUTURE, PT I, 2022, 13192 : 272 - 291
  • [5] Cross-document Coreference Resolution over Predicted Mentions
    Cattan, Arie
    Eirew, Alon
    Stanovsky, Gabriel
    Joshi, Mandar
    Dagan, Ido
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 5100 - 5107
  • [6] Realistic Evaluation Principles for Cross-document Coreference Resolution
    Cattan, Arie
    Eirew, Alon
    Stanovsky, Gabriel
    Joshi, Mandar
    Dagan, Ido
    [J]. 10TH CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS (SEM 2021), 2021, : 143 - 151
  • [7] Cross-document transliterated personal name coreference resolution
    Wang, HF
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 11 - 20
  • [8] A methodology for cross-document coreference
    Bagga, A
    Biermann, AW
    [J]. PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : 207 - 210
  • [9] Cross-Document Coreference Resolution based on Automatic Text Summary
    Gao, Sanyuan
    Li, Si
    Xu, Weiran
    Guo, Jun
    [J]. THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, : 306 - 309
  • [10] Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution
    Barhom, Shany
    Shwartz, Vered
    Eirew, Alon
    Bugert, Michael
    Reimers, Nils
    Dagan, Ido
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4179 - 4189