Mining Relevant Text Features for Retrieving Web Information

被引:1
|
作者
Pipanmekaporn, Luepol [1 ]
Kamolsantiroj, Suwatchai [1 ]
机构
[1] King Mongkuts Univ Technol North Bangkok, Dept Comp & Informat Sci, Bangkok 10800, Thailand
关键词
Feature Extraction; Feature Selection; Relevance Feedback and Text Mining;
D O I
10.1109/IIAI-AAI.2014.96
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
It is a big challenge to develop effective methods that can discover high quality and useful features in text documents. Most existing information retrieval and text mining methods focuses on term-based approach that often suffers from the problems of term variation and noise. This paper illustrates an innovative approach that discovers relevant knowledge to precisely describe text features for retrieving web information. In particular, it extracts precise text patterns by considering both relevant and irrelevant documents. Then, the discovered patterns are used to find accurate relevant features in a training set. The proposed approach has been evaluated through the implementation of a novel information filtering model and a comparative evaluation is conducted by invoking state-of-the-art models. The experimental results obtained based on the Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms the best baseline method.
引用
收藏
页码:447 / 452
页数:6
相关论文
共 50 条
  • [1] AIS: An approach to Web information processing based on Web text mining
    Zhang, Wen
    Tang, Xi-Jin
    Yoshida, Taketoshi
    [J]. Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice, 2010, 30 (01): : 96 - 104
  • [2] Mining relevant information on the Web: a clique-based approach
    Caramia, Massimiliano
    Felici, Giovanni
    [J]. INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2006, 44 (14) : 2771 - 2787
  • [3] Towards Retrieving Relevant Information Graphics
    Li, Zhuo
    Stagitis, Matthew
    Carberry, Sandra
    Mccoy, Kathleen F.
    [J]. SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, 2013, : 789 - 792
  • [4] Web ensemble averages for retrieving relevant information from rejected Monte Carlo moves
    M. Athènes
    [J]. The European Physical Journal B, 2007, 58 : 83 - 95
  • [5] Web ensemble averages for retrieving relevant information from rejected Monte Carlo moves
    Athenes, M.
    [J]. EUROPEAN PHYSICAL JOURNAL B, 2007, 58 (01): : 83 - 95
  • [6] INFORMATION EXTRACTION VERSUS TEXT SEGMENTATION FOR WEB CONTENT MINING
    Fragkou, Pavlina
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2013, 23 (08) : 1109 - 1137
  • [7] Analyzing web interfaces of databases for retrieving web information
    Lee, JO
    Ko, MC
    Kim, JS
    Moon, CJ
    Kim, YG
    In, HP
    [J]. CONTENT COMPUTING, PROCEEDINGS, 2004, 3309 : 331 - 336
  • [8] Using the web information structure for retrieving web pages
    Adriani, Mirna
    Pandugita, Rama
    [J]. ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 892 - 897
  • [9] Geographic Information Retrieval and Text Mining on Chinese Tourism Web Pages
    Tsou, Ming-Cheng
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2010, 5 (01) : 56 - 75
  • [10] Improving the Effectiveness of Short Text Understanding by Using Web Information Mining
    Katekar, Aparna M.
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC), 2017, : 111 - 116