Concept-Based Document Classification Using Wikipedia and Value Function

被引:8
|
作者
Malo, Pekka [1 ]
Sinha, Ankur [1 ]
Wallenius, Jyrki [1 ]
Korhonen, Pekka [1 ]
机构
[1] Aalto Univ, Sch Econ, Dept Business Technol, FI-00076 Aalto, Finland
关键词
MULTIOBJECTIVE EVOLUTIONARY ALGORITHMS; INFORMATION-RETRIEVAL; PROGRESSIVE ALGORITHM; BOOLEAN QUERIES; ONTOLOGY; SEARCH; SYSTEM;
D O I
10.1002/asi.21596
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose a new concept-based method for document classification. The conceptual knowledge associated with the words is drawn from Wikipedia. The purpose is to utilize the abundant semantic relatedness information available in Wikipedia in an efficient value function-based query learning algorithm. The procedure learns the value function by solving a simple linear programming problem formulated using the training documents. The learning involves a step-wise iterative process that helps in generating a value function with an appropriate set of concepts (dimensions) chosen from a collection of concepts. Once the value function is formulated, it is utilized to make a decision between relevance and irrelevance. The value assigned to a particular document from the value function can be further used to rank the documents according to their relevance. Reuters newswire documents have been used to evaluate the efficacy of the procedure. An extensive comparison with other frameworks has been performed. The results are promising.
引用
收藏
页码:2496 / 2511
页数:16
相关论文
共 50 条
  • [1] Short Text Classification using Wikipedia Concept based Document Representation
    Wang, Xiang
    Chen, Ruhua
    Jia, Yan
    Zhou, Bin
    [J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA), 2013, : 471 - 474
  • [2] Improved concept-based query expansion using Wikipedia
    Yuvarani, M.
    Iyengar, N. Ch. S. N.
    Kannan, A.
    [J]. INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2013, 11 (01) : 26 - 41
  • [3] CONCEPT-BASED CLASSIFICATION FOR MULTI-DOCUMENT SUMMARIZATION
    Celikyilmaz, Asli
    Hakkani-Tuer, Dilek
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5540 - 5543
  • [4] Interactive Query Expansion using Concept-Based Directions Finder Based on Wikipedia
    Meiyappan, Yuvarani
    Iyengar, Sriman Narayana
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2013, 10 (06) : 571 - 578
  • [5] Corpus-level and Concept-based Explanations for Interpretable Document Classification
    Shi, Tian
    Zhang, Xuchao
    Wang, Ping
    Reddy, Chandan K.
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (03)
  • [6] Concept-based Document Models using Explicit Semantic Analysis
    Luo, Jing
    Meng, Bo
    Tu, Xinhui
    Liu, Maofu
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2012), 2012, : 338 - 342
  • [7] Using WordNet for Concept-Based Document Indexing in Information Retrieval
    Boubekeur, Fatiha
    Boughanem, Mohand
    Tamine, Lynda
    Daoud, Mariam
    [J]. SEMAPRO 2010: THE FOURTH INTERNATIONAL CONFERENCE ON ADVANCES IN SEMANTIC PROCESSING, 2010, : 151 - 157
  • [8] Concept-based document recommendations for CiteSeer authors
    Chandrasekaran, Karman
    Gauch, Susan
    Lakkaraju, Praveen
    Luong, Hiep Phuc
    [J]. ADAPTIVE HYPERMEDIA AND ADAPTIVE WEB-BASED SYSTEMS, 2008, 5149 : 83 - +
  • [9] Learning a concept-based document similarity measure
    Huang, Lan
    Milne, David
    Frank, Eibe
    Witten, Ian H.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (08): : 1593 - 1608
  • [10] Predicting software defect type using concept-based classification
    Sangameshwar Patil
    B. Ravindran
    [J]. Empirical Software Engineering, 2020, 25 : 1341 - 1378