INFOSYNC: Information Synchronization across Multilingual Semi-structured Tables

被引:0
|
作者
Khincha, Siddharth [1 ]
Jain, Chelsi [2 ]
Gupta, Vivek [3 ]
Kataria, Tushar [3 ]
Zhang, Shuo [4 ]
机构
[1] IIT Guwahati, Gauhati, India
[2] CTAE, Udaipur, Rajasthan, India
[3] Univ Utah, Salt Lake City, UT 84112 USA
[4] Bloomberg, New York, NY USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年
关键词
WIKIPEDIA; BIAS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information Synchronization of semi-structured data across languages is challenging. For instance, Wikipedia tables in one language should be synchronized across languages. To address this problem, we introduce a new dataset INFOSYNC and a two-step method for tabular synchronization. INFOSYNC contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (similar to 3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on INFOSYNC, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of the proposed method.
引用
收藏
页码:2536 / 2559
页数:24
相关论文
共 50 条
  • [31] Information and Analytical Support of the Authorities Using Semi-Structured Data
    Mikhaylova, Ekaterina
    Mityagin, Sergey
    Tikhonova, Olga
    Zakharov, Yuriy
    9TH INTERNATIONAL CONFERENCE ON THEORY AND PRACTICE OF ELECTRONIC GOVERNANCE (ICEGOV 2016), 2016, : 356 - 357
  • [32] A semi-structured information semantic annotation method for Web pages
    Zhang, Lu
    Wang, Tiantian
    Liu, Yiran
    Duan, Qingling
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 6491 - 6501
  • [33] A NoSQL Database Approach for Modeling Heterogeneous and Semi-Structured Information
    Vonitsanos, Gerasimos
    Kanavos, Andreas
    Mylonas, Phivos
    Sioutas, Spyros
    2018 9TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS (IISA), 2018, : 296 - 303
  • [34] A strategy for extracting information from semi-structured web pages
    Shaker, Mahmoud
    Ibrahim, Hamidah
    Mustapha, Aida
    Abdullah, Lili Nurliyana
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2010, 6 (04) : 304 - 318
  • [35] Learning Information Extraction Rules for Semi-Structured and Free Text
    Stephen Soderland
    Machine Learning, 1999, 34 : 233 - 272
  • [36] A semi-structured information semantic annotation method for Web pages
    Lu Zhang
    Tiantian Wang
    Yiran Liu
    Qingling Duan
    Neural Computing and Applications, 2020, 32 : 6491 - 6501
  • [37] A storage and retrieval model based on XML for semi-structured information
    Gao, L
    Chen, HP
    Gu, JG
    Wang, JC
    Fang, HP
    Li, XH
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 33 - 38
  • [38] Information Extraction of Strategic Activities based on Semi-structured Text
    Ma, Xubu
    Guo, Ju-E
    Ma, Xubu
    2014 SEVENTH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION (CSO), 2014, : 579 - 583
  • [39] Searching the Web as an elemental semi-structured information system of today
    Fülep, D
    DISTRIBUTED AND PARALLEL SYSTEMS : FROM INSTRUCTION PARALLELISM TO CLUSTER COMPUTING, 2000, 567 : 215 - 223
  • [40] Semi-Structured Distributional Regression
    Ruegamer, David
    Kolb, Chris
    Klein, Nadja
    AMERICAN STATISTICIAN, 2024, 78 (01): : 88 - 99