WikiAnalytics: Ad-hoc Querying of Highly Heterogeneous Structured Data

被引:0
|
作者
Balmin, Andrey [1 ]
Curtmola, Emiran [2 ]
机构
[1] IBM Almaden Res Ctr, San Jose, CA 95120 USA
[2] Univ Calif San Diego, La Jolla, CA 92093 USA
关键词
D O I
10.1109/ICDE.2010.5447751
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Searching and extracting meaningful information out of highly heterogeneous datasets is a hot topic that received a lot of attention. However, the existing solutions are based on either rigid complex query languages (e.g., SQL, XQuery/XPath) which are hard to use without full schema knowledge, without an expert user, and which require up-front data integration. At the other extreme, existing solutions employ keyword search queries over relational databases [3], [1], [10], [9], [2], [11] as well as over semistructured data [6], [12], [17], [15] which are too imprecise to specify exactly the user's intent [16]. To address these limitations, we propose an alternative search paradigm in order to derive tables of precise and complete results from a very sparse set of heterogeneous records. Our approach allows users to disambiguate search results by navigation along conceptual dimensions that describe the records. Therefore, we cluster documents based on fields and values that contain the query keywords. We build a universal navigational lattice (UNL) over all such discovered clusters. Conceptually, the UNL encodes all possible ways to group the documents in the data corpus based on where the keywords hit. We describe, WIKIANALYTICS, a system that facilitates data extraction from the Wikipedia infobox collection. WIKIANALYTICS provides a dynamic and intuitive interface that lets the average user explore the search results and construct homogeneous structured tables, which can be further queried and mashed up (e.g., filtered and aggregated) using the conventional tools.
引用
收藏
页码:1145 / 1148
页数:4
相关论文
共 50 条
  • [1] BINARY: A Framework for Big Data Integration for Ad-hoc Querying
    Eftekhari, Azadeh
    Zulkernine, Farhana
    Martin, Patrick
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2746 - 2753
  • [2] DataCalc: Ad-hoc Analyses on Heterogeneous Data Sources
    Luong, Johannes
    Habich, Dirk
    Lehner, Wolfgang
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 463 - 468
  • [3] Ad-Hoc Querying of Semistar Data Ontologies Using Controlled Natural Language
    Barzdins, Janis
    Grasmanis, Mikus
    Rencis, Edgars
    Sostaks, Agris
    Barzdins, Juris
    DATABASES AND INFORMATION SYSTEMS IX, 2016, 291 : 3 - 16
  • [4] VMQL: A visual language for ad-hoc model querying
    Storrle, Harald
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2011, 22 (01): : 3 - 29
  • [5] On Keyword-Based Ad-Hoc Querying of Hospital Data Stored in Semistar Data Ontologies
    Rencis, Edgars
    CENTERIS 2018 - INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS / PROJMAN 2018 - INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT / HCIST 2018 - INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, CENTERI, 2018, 138 : 27 - 32
  • [6] Topology control in heterogeneous ad-hoc networks
    Srivastava, G
    Boustead, P
    Chicharo, JF
    2004 12TH IEEE INTERNATIONAL CONFERENCE ON NETWORKS, VOLS 1 AND 2 , PROCEEDINGS: UNITY IN DIVERSITY, 2004, : 665 - 670
  • [7] Ad-Hoc Data Processing in the Cloud
    Logothetis, Dionysios
    Yocum, Kenneth
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1472 - 1475
  • [8] Key agreement for heterogeneous mobile ad-hoc groups
    Manulis, M. (mark.manulis@rub.de), IEEE Computer Society TCDP and TCPP; Fukuoka Institute of Technology, FIT, Japan (Institute of Electrical and Electronics Engineers Computer Society):
  • [9] Key agreement for heterogeneous mobile ad-hoc groups
    Manulis, M
    11TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS WORKSHOPS, VOL II, PROCEEDINGS,, 2005, : 290 - 294
  • [10] Disruption tolerant networking for heterogeneous ad-hoc networks
    Fall, Kevin
    MILCOM 2005 - 2005 IEEE MILITARY COMMUNICATIONS CONFERENCE, VOLS 1-5, 2005, : 2195 - 2201