WikiAnalytics: Ad-hoc Querying of Highly Heterogeneous Structured Data

被引:0
|
作者
Balmin, Andrey [1 ]
Curtmola, Emiran [2 ]
机构
[1] IBM Almaden Res Ctr, San Jose, CA 95120 USA
[2] Univ Calif San Diego, La Jolla, CA 92093 USA
关键词
D O I
10.1109/ICDE.2010.5447751
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Searching and extracting meaningful information out of highly heterogeneous datasets is a hot topic that received a lot of attention. However, the existing solutions are based on either rigid complex query languages (e.g., SQL, XQuery/XPath) which are hard to use without full schema knowledge, without an expert user, and which require up-front data integration. At the other extreme, existing solutions employ keyword search queries over relational databases [3], [1], [10], [9], [2], [11] as well as over semistructured data [6], [12], [17], [15] which are too imprecise to specify exactly the user's intent [16]. To address these limitations, we propose an alternative search paradigm in order to derive tables of precise and complete results from a very sparse set of heterogeneous records. Our approach allows users to disambiguate search results by navigation along conceptual dimensions that describe the records. Therefore, we cluster documents based on fields and values that contain the query keywords. We build a universal navigational lattice (UNL) over all such discovered clusters. Conceptually, the UNL encodes all possible ways to group the documents in the data corpus based on where the keywords hit. We describe, WIKIANALYTICS, a system that facilitates data extraction from the Wikipedia infobox collection. WIKIANALYTICS provides a dynamic and intuitive interface that lets the average user explore the search results and construct homogeneous structured tables, which can be further queried and mashed up (e.g., filtered and aggregated) using the conventional tools.
引用
收藏
页码:1145 / 1148
页数:4
相关论文
共 50 条
  • [21] Federating personal networks over heterogeneous ad-hoc scenarios
    Sanchez, Luis
    Lanza, Jorge
    Munoz, Luis
    PERSONAL WIRELESS COMMUNICATIONS, 2007, 245 : 38 - +
  • [22] Querying in Packs: Trustworthy Data Management in Ad Hoc Networks
    Patwardhan, Anand
    Perich, Filip
    Joshi, Anupam
    Finin, Tim
    Yesha, Yelena
    INTERNATIONAL JOURNAL OF WIRELESS INFORMATION NETWORKS, 2006, 13 (04) : 263 - 274
  • [23] Minimum energy range assignment in heterogeneous ad-hoc networks
    Di Ianni, Miriam
    Rossi, Gianluca
    21ST INTERNATIONAL CONFERENCE ON ADVANCED NETWORKING AND APPLICATIONS WORKSHOPS/SYMPOSIA, VOL 2, PROCEEDINGS, 2007, : 121 - +
  • [24] Ad-hoc networking
    Al Agha, K
    Gerla, M
    Pujolle, G
    WIRELESS NETWORKS, 2004, 10 (04) : 345 - 346
  • [25] THE AD-HOC COMMUNITY
    BOHANNAN, P
    CENTER MAGAZINE, 1980, 13 (03): : 36 - 37
  • [26] AN AD-HOC DATA NETWORK FOR MEDICAL DATA COLLECTION AND REDUCTION
    Chetan, Mihai
    Morega, Alexandru M.
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2009, 71 (04): : 219 - 228
  • [27] FUTURE AD-HOC
    HOMET, R
    TELECOMMUNICATIONS POLICY, 1978, 2 (01) : 70 - 72
  • [28] AD-HOC APOLOGETICS
    WERPEHOWSKI, W
    JOURNAL OF RELIGION, 1986, 66 (03): : 282 - 301
  • [29] Distributed lookup in structured peer-to-peer ad-hoc networks
    Kummer, Raphael
    Kropf, Peter
    Felber, Pascal
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2006: COOPIS, DOA, GADA, AND ODBASE PT 2, PROCEEDINGS, 2006, 4276 : 1541 - 1554
  • [30] Combining Inverted Indices and Structured Search for Ad-hoc Object Retrieval
    Tonon, Alberto
    Demartini, Gianluca
    Cudre-Mauroux, Philippe
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 125 - 134