Automatic extraction of useful facet hierarchies from text databases

被引:41
|
作者
Dakka, Wisam [1 ]
Ipeirotis, Panagiotis G. [2 ]
机构
[1] Columbia Univ, Dept Comp Sci, 1214 Amsterdam Ave, New York, NY 10027 USA
[2] NYU, Dept Informat Operat & Management Sci, New York, NY 10012 USA
来源
2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3 | 2008年
关键词
D O I
10.1109/ICDE.2008.4497455
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Faceted interfaces represent a new powerful paradigm that proved to be a successful complement to keyword searching. Thus far, the identification of the facets was either a manual procedure, or relied on apriori knowledge of the facets that can potentially appear in the underlying collection. In this paper, we present an unsupervised technique for automatic extraction of facets useful for browsing text databases. In particular, we observe, through a pilot study, that facet terms rarely appear in text documents, showing that we need external resources to identify useful facet terms. For this, we first identify important phrases in each document. Then, we expand each phrase with "context" phrases using external resources, such as WordNet and Wikipedia, causing facet terms to appear in the expanded database. Finally, we compare the term distributions in the original database and the expanded database to identify the terms that can be used to construct browsing facets. Our extensive user studies, using the Amazon Mechanical Turk service, show that our techniques produce facets with high precision and recall that are superior to existing approaches and help users locate interesting items faster.
引用
收藏
页码:466 / +
页数:2
相关论文
共 50 条
  • [41] Automatic extraction of gene/protein biological functions from biomedical text
    Koike, A
    Niwa, Y
    Takagi, T
    BIOINFORMATICS, 2005, 21 (07) : 1227 - 1236
  • [42] Automatic Summarization and Keyword Extraction from Web Page or Text File
    You, Xiangdong
    2019 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY (CCET), 2019, : 154 - 158
  • [43] Automatic Text Generation via Text Extraction Based on Submodular
    Ai, Lisi
    Li, Na
    Zheng, Jianbing
    Gao, Ming
    WEB AND BIG DATA, 2017, 10612 : 237 - 246
  • [44] Automatic Feature Extraction and Text Recognition From Scanned Topographic Maps
    Pezeshk, Aria
    Tutwiler, Richard L.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2011, 49 (12): : 5047 - 5063
  • [45] Automatic construction of navigable concept networks characterizing text databases
    Carpineto, C
    Romano, G
    TOPICS IN ARTIFICIAL INTELLIGENCE, 1995, 992 : 67 - 78
  • [46] Semi-Automatic Extraction of Triangular Facet Attitude Based on Edge Extraction Algorithm
    Lin N.
    Xu Y.
    Gao B.
    Weng X.
    Chen N.
    Diqiu Kexue - Zhongguo Dizhi Daxue Xuebao/Earth Science - Journal of China University of Geosciences, 2021, 46 (10): : 3753 - 3763
  • [47] A Bayesian method for the automatic extraction of meaningful clinical sequences from large clinical databases
    Shrestha, Aashara
    Zikos, Dimitrios
    Fegaras, Leonidas
    Blebea, John
    Sasso, Robert A.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 233
  • [48] Automatic Extraction of Command Hierarchies for Adaptive Brain-Robot Interfacing
    Bryan, Matthew
    Nicoll, Griffin
    Thomas, Vibinash
    Chung, Mike
    Smith, Joshua R.
    Rao, Rajesh P. N.
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 3691 - 3697
  • [49] Automatic extraction of the fine category of person named entities from text corpora
    Nguyen, Tri-Thanh
    Shimazu, Akira
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (10) : 1542 - 1549
  • [50] Vulcan: Automatic extraction and analysis of cyber threat intelligence from unstructured text
    Jo, Hyeonseong
    Lee, Yongjae
    Shin, Seungwon
    COMPUTERS & SECURITY, 2022, 120