WikiAnalytics: Ad-hoc Querying of Highly Heterogeneous Structured Data

被引:0
|
作者
Balmin, Andrey [1 ]
Curtmola, Emiran [2 ]
机构
[1] IBM Almaden Res Ctr, San Jose, CA 95120 USA
[2] Univ Calif San Diego, La Jolla, CA 92093 USA
关键词
D O I
10.1109/ICDE.2010.5447751
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Searching and extracting meaningful information out of highly heterogeneous datasets is a hot topic that received a lot of attention. However, the existing solutions are based on either rigid complex query languages (e.g., SQL, XQuery/XPath) which are hard to use without full schema knowledge, without an expert user, and which require up-front data integration. At the other extreme, existing solutions employ keyword search queries over relational databases [3], [1], [10], [9], [2], [11] as well as over semistructured data [6], [12], [17], [15] which are too imprecise to specify exactly the user's intent [16]. To address these limitations, we propose an alternative search paradigm in order to derive tables of precise and complete results from a very sparse set of heterogeneous records. Our approach allows users to disambiguate search results by navigation along conceptual dimensions that describe the records. Therefore, we cluster documents based on fields and values that contain the query keywords. We build a universal navigational lattice (UNL) over all such discovered clusters. Conceptually, the UNL encodes all possible ways to group the documents in the data corpus based on where the keywords hit. We describe, WIKIANALYTICS, a system that facilitates data extraction from the Wikipedia infobox collection. WIKIANALYTICS provides a dynamic and intuitive interface that lets the average user explore the search results and construct homogeneous structured tables, which can be further queried and mashed up (e.g., filtered and aggregated) using the conventional tools.
引用
收藏
页码:1145 / 1148
页数:4
相关论文
共 50 条
  • [41] Brighthouse: An Analytic Data Warehouse for Ad-hoc Queries
    Slezak, Dominik
    Wroblewski, Jakub
    Eastwood, Victoria
    Synak, Piotr
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1337 - 1345
  • [42] Ad-hoc filesystem: A novel network Filesystem for ad-hoc wireless networks
    Yasuda, K
    Hagino, T
    NETWORKING - ICN 2001, PART II, PROCEEDINGS, 2001, 2094 : 177 - 185
  • [43] An overlay approach to data security in ad-hoc networks
    Liebeherr, Joerg
    Dong, Guangyu
    AD HOC NETWORKS, 2007, 5 (07) : 1055 - 1072
  • [44] Designing Document SQL (DSQL): An Accessible yet Comprehensive Ad-Hoc Querying Frontend for XQuery
    Sengupta, Arijit
    Ramesh, V.
    JOURNAL OF DATABASE MANAGEMENT, 2009, 20 (04) : 26 - 53
  • [45] Cooperating with smartness: Using heterogeneous smart antennas in ad-hoc networks
    Sundaresan, Karthikeyan
    Sivakumar, Raghupathy
    INFOCOM 2007, VOLS 1-5, 2007, : 303 - +
  • [46] AD-HOC ON DEMAND AUTHENTICATION CHAIN PROTOCOL An Authentication Protocol for Ad-hoc Networks
    Hamad, A. M.
    Khedr, W. I.
    SECRYPT 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, 2009, : 274 - +
  • [47] HOW TO MAKE AD-HOC POLYMORPHISM LESS AD HOC
    WADLER, P
    BLOTT, S
    CONFERENCE RECORD OF THE SIXTEENTH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES, 1989, : 60 - 76
  • [48] Ad-Hoc Reviewers For 2006
    不详
    Journal of Nonverbal Behavior, 2007, 31 (1) : 77 - 77
  • [49] MADN - Multipath Ad-hoc Data Network Prototype and Experiments
    Angius, Fabio
    Bhiday, Aditya
    Gerla, Mario
    Pau, Giovanni
    2013 9TH INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING CONFERENCE (IWCMC), 2013, : 686 - 693
  • [50] AD-HOC MEETING REPORT
    KOFKE, WA
    JOURNAL OF NEUROSURGICAL ANESTHESIOLOGY, 1994, 6 (04) : 298 - 299