Automatic extraction of useful facet hierarchies from text databases

被引:41
|
作者
Dakka, Wisam [1 ]
Ipeirotis, Panagiotis G. [2 ]
机构
[1] Columbia Univ, Dept Comp Sci, 1214 Amsterdam Ave, New York, NY 10027 USA
[2] NYU, Dept Informat Operat & Management Sci, New York, NY 10012 USA
来源
2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3 | 2008年
关键词
D O I
10.1109/ICDE.2008.4497455
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Faceted interfaces represent a new powerful paradigm that proved to be a successful complement to keyword searching. Thus far, the identification of the facets was either a manual procedure, or relied on apriori knowledge of the facets that can potentially appear in the underlying collection. In this paper, we present an unsupervised technique for automatic extraction of facets useful for browsing text databases. In particular, we observe, through a pilot study, that facet terms rarely appear in text documents, showing that we need external resources to identify useful facet terms. For this, we first identify important phrases in each document. Then, we expand each phrase with "context" phrases using external resources, such as WordNet and Wikipedia, causing facet terms to appear in the expanded database. Finally, we compare the term distributions in the original database and the expanded database to identify the terms that can be used to construct browsing facets. Our extensive user studies, using the Amazon Mechanical Turk service, show that our techniques produce facets with high precision and recall that are superior to existing approaches and help users locate interesting items faster.
引用
收藏
页码:466 / +
页数:2
相关论文
共 50 条
  • [31] TEXT: Automatic Template Extraction from Heterogeneous Web Pages
    Kim, Chulyun
    Shim, Kyuseok
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (04) : 612 - 626
  • [33] Profile extraction from mean profile for automatic text categorization
    Lakshmi, K.
    Mukherjee, Saswati
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 2, PROCEEDINGS, 2006, : 384 - +
  • [34] Automatic extraction of microorganisms and their habitats from free text using text mining workflows
    Kolluru, BalaKrishna
    Nakjang, Sirintra
    Hirt, Robert P.
    Wipat, Anil
    Ananiadou, Sophia
    JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2011, 8 (02):
  • [35] Speech-to-Text Summarization Using Automatic Phrase Extraction from Recognized Text
    Rott, Michal
    Cerva, Petr
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 101 - 108
  • [36] CUTER: an Efficient Useful Text Extraction Mechanism
    Adam, George
    Bouras, Christos
    Poulopoulos, Vassilis
    2009 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS: WAINA, VOLS 1 AND 2, 2009, : 703 - 708
  • [37] Deriving concept hierarchies from text
    Sanderson, M
    Croft, B
    SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 206 - 213
  • [38] Automatic Ontology Extraction with Text Clustering
    Di Martino, Beniamino
    Cantiello, Pasquale
    INTELLIGENT DISTRIBUTED COMPUTING III, 2009, 237 : 215 - 220
  • [39] Automatic Extraction of Text and Non-text Information Directly from Compressed Document Images
    Javed, Mohammed
    Nagabhushan, P.
    Chaudhuri, Bidyut B.
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 38 - 46
  • [40] Automatic Extraction of Polish Language Errors from Text Edition History
    Grundkiewicz, Roman
    TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 129 - 136