Concept-Based Document Classification Using Wikipedia and Value Function

被引:8
|
作者
Malo, Pekka [1 ]
Sinha, Ankur [1 ]
Wallenius, Jyrki [1 ]
Korhonen, Pekka [1 ]
机构
[1] Aalto Univ, Sch Econ, Dept Business Technol, FI-00076 Aalto, Finland
关键词
MULTIOBJECTIVE EVOLUTIONARY ALGORITHMS; INFORMATION-RETRIEVAL; PROGRESSIVE ALGORITHM; BOOLEAN QUERIES; ONTOLOGY; SEARCH; SYSTEM;
D O I
10.1002/asi.21596
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose a new concept-based method for document classification. The conceptual knowledge associated with the words is drawn from Wikipedia. The purpose is to utilize the abundant semantic relatedness information available in Wikipedia in an efficient value function-based query learning algorithm. The procedure learns the value function by solving a simple linear programming problem formulated using the training documents. The learning involves a step-wise iterative process that helps in generating a value function with an appropriate set of concepts (dimensions) chosen from a collection of concepts. Once the value function is formulated, it is utilized to make a decision between relevance and irrelevance. The value assigned to a particular document from the value function can be further used to rank the documents according to their relevance. Reuters newswire documents have been used to evaluate the efficacy of the procedure. An extensive comparison with other frameworks has been performed. The results are promising.
引用
收藏
页码:2496 / 2511
页数:16
相关论文
共 50 条
  • [21] Document indexing: a concept-based approach to term weight estimation
    Kang, BY
    Lee, SJ
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (05) : 1065 - 1080
  • [22] On Shapley value interpretability in concept-based learning with formal concept analysis
    Dmitry I. Ignatov
    Léonard Kwuida
    [J]. Annals of Mathematics and Artificial Intelligence, 2022, 90 : 1197 - 1222
  • [23] ConceptEVA: Concept-Based Interactive Exploration and Customization of Document Summaries
    Zhang, Xiaoyu
    Li, Jianping Kelvin
    Chi, Po-Wei
    Chandrasegaran, Senthil
    Ma, Kwan-Liu
    [J]. PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2023), 2023,
  • [24] Concept-Based Label Distribution Learning for Text Classification
    Li, Hui
    Huang, Guimin
    Li, Yiqun
    Zhang, Xiaowei
    Wang, Yabing
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2022, 15 (01)
  • [25] On Shapley value interpretability in concept-based learning with formal concept analysis
    Ignatov, Dmitry I.
    Kwuida, Leonard
    [J]. ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2022, 90 (11-12) : 1197 - 1222
  • [26] A semi-supervised framework for concept-based hierarchical document clustering
    Sadjadi, Seyed Mojtaba
    Mashayekhi, Hoda
    Hassanpour, Hamid
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (06): : 3861 - 3890
  • [27] Concept-based Topic Attention for a Convolutional Sequence Document Summarization Model
    Khanam, Shirin Akther
    Liu, Fei
    Chen, Yi-Ping Phoebe
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [28] A semi-supervised framework for concept-based hierarchical document clustering
    Seyed Mojtaba Sadjadi
    Hoda Mashayekhi
    Hamid Hassanpour
    [J]. World Wide Web, 2023, 26 : 3861 - 3890
  • [29] GENERATING, INTEGRATING, AND ACTIVATING THESAURI FOR CONCEPT-BASED DOCUMENT-RETRIEVAL
    CHEN, HC
    LYNCH, KJ
    BASU, K
    NG, TD
    [J]. IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1993, 8 (02): : 25 - 34
  • [30] Hierarchical document categorization with k-NN and concept-based thesauri
    Bang, SL
    Yang, JD
    Yang, HJ
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (02) : 387 - 406