Graph integration of structured, semistructured and unstructured data for data journalism

被引:12
|
作者
Anadiotis, Angelos Christos [1 ,2 ]
Balalau, Oana [3 ]
Conceicao, Catarina [4 ,5 ]
Galhardas, Helena [4 ,5 ]
Haddad, Mhd Yamen [3 ]
Manolescu, Ioana [3 ]
Merabti, Tayeb [3 ]
You, Jingmao [3 ]
机构
[1] Inst Polytech Paris, Ecole Polytech, Paris, France
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[3] Inst Polytech Paris, INRIA, Paris, France
[4] Univ Lisbon, INESC ID, Lisbon, Portugal
[5] Univ Lisbon, IST, Lisbon, Portugal
关键词
Data journalism; Heterogeneous data integration; Information extraction; NAMED ENTITY RECOGNITION;
D O I
10.1016/j.is.2021.101846
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Digital data is a gold mine for modern journalism. However, datasets which interest journalists are extremely heterogeneous, ranging from highly structured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to define and deploy custom extract-transform-load workflows, especially for dynamically varying sets of data sources. We describe a complete approach for integrating dynamic sets of heterogeneous datasets along the lines described above: the challenges we faced to make such graphs useful, allow their integration to scale, and the solutions we proposed for these problems. Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Managing Structured and Semistructured RDF Data Using Structure Indexes
    Thanh Tran
    Ladwig, Guenter
    Rudolph, Sebastian
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (09) : 2076 - 2089
  • [22] Data Integration Approach for Semi-structured and Structured Data (Linked Data)
    Kettouch, Mohamed Salah
    Luca, Cristina
    Hobbs, Mike
    Fatima, Arooj
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2015, : 820 - 825
  • [23] A Combined Index for Mixed Structured and Unstructured Data
    Zhu, Chunying
    Li, Qingzhong
    Kong, Lanju
    Wei, Song
    [J]. 2015 12TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2015, : 217 - 222
  • [24] Modeling semistructured data by using graph-based constraints
    Damiani, E
    Oliboni, B
    Quintarelli, E
    Tanca, L
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2003: OTM 2003 WORKSHOPS, 2003, 2889 : 20 - 21
  • [25] Associated Index for Big Structured and Unstructured Data
    Zhu, Chunying
    Li, Qingzhong
    Kong, Lanju
    Wang, Xiangwei
    Hong, Xiaoguang
    [J]. WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 : 567 - 570
  • [26] Visualising structured compound data in an unstructured way
    Chisholm, James
    Leeding, Chris
    Champness, Edmund J.
    Martinez, Hector Garcia
    Elliott, Alex
    Segall, Matthew
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 248
  • [27] Interpretable algorithmic fairness in structured and unstructured data
    Bandi, Hari
    Bertsimas, Dimitris
    Koukouvinos, Thodoris
    Kupiec, Sofie
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [28] Hybrid Search Ranking for Structured and Unstructured Data
    Herzig, Daniel M.
    [J]. SEMANTIC WEB: RESEARCH AND APPLICATIONS, PT II, 2011, 6644 : 518 - 522
  • [29] Managing unstructured data with structured legacy systems
    Maluf, David A.
    Tran, Peter B.
    [J]. 2008 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2008, : 4276 - 4280
  • [30] Accelerating Unstructured Graph Data Processing on GPUs
    Pan, Xiaohui
    [J]. 2ND INTERNATIONAL CONFERENCE ON SIMULATION AND MODELING METHODOLOGIES, TECHNOLOGIES AND APPLICATIONS (SMTA 2015), 2015, : 29 - 33