Automatic extraction of useful facet hierarchies from text databases

被引:41
|
作者
Dakka, Wisam [1 ]
Ipeirotis, Panagiotis G. [2 ]
机构
[1] Columbia Univ, Dept Comp Sci, 1214 Amsterdam Ave, New York, NY 10027 USA
[2] NYU, Dept Informat Operat & Management Sci, New York, NY 10012 USA
关键词
D O I
10.1109/ICDE.2008.4497455
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Faceted interfaces represent a new powerful paradigm that proved to be a successful complement to keyword searching. Thus far, the identification of the facets was either a manual procedure, or relied on apriori knowledge of the facets that can potentially appear in the underlying collection. In this paper, we present an unsupervised technique for automatic extraction of facets useful for browsing text databases. In particular, we observe, through a pilot study, that facet terms rarely appear in text documents, showing that we need external resources to identify useful facet terms. For this, we first identify important phrases in each document. Then, we expand each phrase with "context" phrases using external resources, such as WordNet and Wikipedia, causing facet terms to appear in the expanded database. Finally, we compare the term distributions in the original database and the expanded database to identify the terms that can be used to construct browsing facets. Our extensive user studies, using the Amazon Mechanical Turk service, show that our techniques produce facets with high precision and recall that are superior to existing approaches and help users locate interesting items faster.
引用
收藏
页码:466 / +
页数:2
相关论文
共 50 条
  • [1] Automatic construction of ontology from text databases
    Zhong, N
    Yao, YY
    Kakemoto, Y
    DATA MINING II, 2000, 2 : 173 - 180
  • [2] Automatic Extraction of Causal Chains from Text
    Huminski, Aliaksandr
    Bin, Ng Yan
    LIBRES-LIBRARY AND INFORMATION SCIENCE RESEARCH ELECTRONIC JOURNAL, 2019, 29 (02): : 99 - 108
  • [3] Automatic extraction of hierarchical relations from text
    Wang, Ting
    Li, Yaoyong
    Bontcheva, Kalina
    Cunningham, Hamish
    Wang, Ji
    SEMANTIC WEB: RESEARCH AND APPLICATIONS, PROCEEDINGS, 2006, 4011 : 215 - 229
  • [4] Automatic Keyword Extraction From Dialogue Text
    Sali, Yusuf
    Erden, Mustafa
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [5] AUTOMATIC EXTRACTION OF FUNCTION KNOWLEDGE FROM TEXT
    Cheong, Hyunmin
    Li, Wei
    Cheung, Adrian
    Nogueira, Andy
    Iorio, Francesco
    INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2015, VOL 2A, 2016,
  • [6] Automatic extraction of collocations from Korean text
    Kim, S
    Yoon, J
    Song, MS
    COMPUTERS AND THE HUMANITIES, 2001, 35 (03): : 273 - 297
  • [7] Automatic extraction of angiogenesis bioprocess from text
    Wang, Xinglong
    McKendrick, Iain
    Barrett, Ian
    Dix, Ian
    French, Tim
    Tsujii, Jun'ichi
    Ananiadou, Sophia
    BIOINFORMATICS, 2011, 27 (19) : 2730 - 2737
  • [8] Automatic text extraction from color image
    Liu, WP
    Su, H
    Chi, CY
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2000, PTS 1-3, 2000, 4067 : 1544 - 1550
  • [9] Automatic Text Extraction from Arabic Newspapers
    Vasilopoulos, Nikos
    Wasfi, Yazan
    Kavallieratou, Ergina
    IMAGE ANALYSIS AND RECOGNITION (ICIAR 2018), 2018, 10882 : 505 - 510
  • [10] Automatic Relation Extraction from Text: A Survey
    Li, Kun
    Zhang, Junsheng
    Yao, Changqing
    Shi, Chongde
    2016 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI), 2016, : 83 - 86