Graph integration of structured, semistructured and unstructured data for data journalism

被引:12
|
作者
Anadiotis, Angelos Christos [1 ,2 ]
Balalau, Oana [3 ]
Conceicao, Catarina [4 ,5 ]
Galhardas, Helena [4 ,5 ]
Haddad, Mhd Yamen [3 ]
Manolescu, Ioana [3 ]
Merabti, Tayeb [3 ]
You, Jingmao [3 ]
机构
[1] Inst Polytech Paris, Ecole Polytech, Paris, France
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[3] Inst Polytech Paris, INRIA, Paris, France
[4] Univ Lisbon, INESC ID, Lisbon, Portugal
[5] Univ Lisbon, IST, Lisbon, Portugal
关键词
Data journalism; Heterogeneous data integration; Information extraction; NAMED ENTITY RECOGNITION;
D O I
10.1016/j.is.2021.101846
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Digital data is a gold mine for modern journalism. However, datasets which interest journalists are extremely heterogeneous, ranging from highly structured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to define and deploy custom extract-transform-load workflows, especially for dynamically varying sets of data sources. We describe a complete approach for integrating dynamic sets of heterogeneous datasets along the lines described above: the challenges we faced to make such graphs useful, allow their integration to scale, and the solutions we proposed for these problems. Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Querying graph-structured data
    Cheng, Jiefeng
    Yu, Jeffrey Xu
    [J]. 2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 23 - 27
  • [42] Structured multigrid agglomeration on a data structure for unstructured meshes
    Hannemann, V
    [J]. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2002, 40 (3-4) : 361 - 368
  • [43] Adding Structured Data in Unstructured Web Chat Conversation
    Wu, Min
    Bhowmick, Arin
    Goldberg, Joseph H.
    [J]. UIST'12: PROCEEDINGS OF THE 25TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2012, : 75 - 82
  • [44] Predicting Customer Behavior with Combination of Structured and Unstructured Data
    Afolabi, Ibukun T.
    Worlu, Rowland E.
    Adebayo, O. P.
    Jonathan, Oluranti
    [J]. 3RD INTERNATIONAL CONFERENCE ON SCIENCE AND SUSTAINABLE DEVELOPMENT (ICSSD 2019): SCIENCE, TECHNOLOGY AND RESEARCH: KEYS TO SUSTAINABLE DEVELOPMENT, 2019, 1299
  • [45] Structured and unstructured modulation and reconstruction of DoFP image data
    Flannery, Connor J.
    Li, Qiwei
    Kurtz, Joseph
    Alenin, Andrey S.
    Tyo, J. Scott
    [J]. POLARIZATION SCIENCE AND REMOTE SENSING X, 2021, 11833
  • [46] Network analytics of structured and unstructured data: an evolutionary solution
    Lichtarge, Olivier
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 252
  • [47] Unstructured data research in business: Toward a structured approach
    de Haan, Evert
    Padigar, Manjunath
    El Kihal, Siham
    Kubler, Raoul
    Wieringa, Jaap E.
    [J]. JOURNAL OF BUSINESS RESEARCH, 2024, 177
  • [48] Extraction of Failure Graphs from Structured and Unstructured data
    Schierle, Martin
    Trabold, Daniel
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 324 - 330
  • [49] A Framework to Integrate Unstructured and Structured Data for Enterprise Analytics
    Dey, Lipika
    Verma, Ishan
    Khurdiya, Arpit
    Bharadwaja, Sameera H.
    [J]. 2013 16TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2013, : 1988 - 1995
  • [50] Converting unstructured and semi-structured data into knowledge
    Rusu, Octavian
    Halcu, Ionela
    Grigoriu, Oana
    Neculoiu, Giorgian
    Sandulescu, Virginia
    Marinescu, Mariana
    Marinescu, Viorel
    [J]. 2013 ROEDUNET INTERNATIONAL CONFERENCE (ROEDUNET): NETWORKING IN EDUCATION, 11TH EDITION, 2013,