INFOSYNC: Information Synchronization across Multilingual Semi-structured Tables

被引:0
|
作者
Khincha, Siddharth [1 ]
Jain, Chelsi [2 ]
Gupta, Vivek [3 ]
Kataria, Tushar [3 ]
Zhang, Shuo [4 ]
机构
[1] IIT Guwahati, Gauhati, India
[2] CTAE, Udaipur, Rajasthan, India
[3] Univ Utah, Salt Lake City, UT 84112 USA
[4] Bloomberg, New York, NY USA
关键词
WIKIPEDIA; BIAS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information Synchronization of semi-structured data across languages is challenging. For instance, Wikipedia tables in one language should be synchronized across languages. To address this problem, we introduce a new dataset INFOSYNC and a two-step method for tabular synchronization. INFOSYNC contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (similar to 3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on INFOSYNC, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of the proposed method.
引用
收藏
页码:2536 / 2559
页数:24
相关论文
共 50 条
  • [1] Compositional Semantic Parsing on Semi-Structured Tables
    Pasupat, Panupong
    Liang, Percy
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 1470 - 1480
  • [2] Logical Inference for Counting on Semi-structured Tables
    Kurosawa, Tomoya
    Yanaka, Hitomi
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 84 - 96
  • [3] Tables as Semi-structured Knowledge for Question Answering
    Jauhar, Sujay Kumar
    Turney, Peter D.
    Hovy, Eduard
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 474 - 483
  • [4] INFOTABS: Inference on Tables as Semi-structured Data
    Gupta, Vivek
    Mehta, Maitrey
    Nokhiz, Pegah
    Srikumar, Vivek
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2309 - 2324
  • [5] TEMPTABQA: Temporal Question Answering for Semi-Structured Tables
    Gupta, Vivek
    Kandoi, Pranshu
    Vora, Mahek Bhavesh
    Zhang, Shuo
    He, Yujie
    Reinanda, Ridho
    Srikumar, Vivek
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2431 - 2453
  • [6] On the information content of semi-structured databases
    Levene, Mark
    Acta Cybernetica, 1998, 13 (03): : 257 - 275
  • [7] SEMCARE: Multilingual Semantic Search in Semi-Structured Clinical Data
    Lopez-Garcia, Pablo
    Kreuzthaler, Markus
    Schulz, Stefan
    Scherr, Daniel
    Daumke, Philipp
    Marko, Kornel
    Kors, Jan A.
    van Mulligen, Erik M.
    Wang, Xinkai
    Gonna, Hanney
    Behr, Elijah
    Honrado, Angel
    HEALTH INFORMATICS MEETS EHEALTH, 2016, 223 : 93 - 99
  • [8] Toward structured retrieval in semi-structured information spaces
    Huffman, SB
    Baudin, C
    IJCAI-97 - PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, 1997, : 751 - 756
  • [9] Managing unstructured and semi-structured information in organisations
    Aitken, Ashley M.
    6th IEEE/ACIS International Conference on Computer and Information Science, Proceedings, 2007, : 712 - 717
  • [10] Multilingual Food and Heath Ontology Learning Using Semi-Structured and Structured Web Data Sources
    Albukhitan, Saeed
    Helmy, Tarek
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY WORKSHOPS (WI-IAT WORKSHOPS 2012), VOL 3, 2012, : 231 - 235