IRText: An Item Response Theory-Based Approach for Text Categorization

被引:0
|
作者
Onder Coban
机构
[1] Adiyaman University,Department of Computer Engineering
关键词
Item response theory; Text categorization; Term weighting; Feature selection;
D O I
暂无
中图分类号
学科分类号
摘要
Text categorization (TC) is a machine learning task that tries to assign a text to one of the predefined categories. In a nutshell, texts are converted into numerical feature vectors in which each feature is bounded with a weight value. Afterward, a classifier is trained on vectorized texts and is used to classify previously unseen documents. Feature selection (FS) is also optionally applied to achieve better classification accuracy by using a lower number of features. Item response theory (IRT), on the other hand, is a set of statistical models designed to understand persons based on their responses to questions by assuming that responses on a given item are a function of both person and item properties. Even though there exist many studies devoted to understand, explore, and improve methods, there is not any previous study that aims at combining powers of these fields. As such, in this study, an IRT-based approach is proposed that suggests using the IRT score of a feature in both term weighting and FS that are important inter-steps of TC. The efficiency of the proposed approach is measured on two well-known benchmark datasets by comparing it with its two traditional peers. Experimental results show that the IRT-based approach can be used for text FS and there is open room for possible improvements. To the best of our knowledge, this study is the first of its kind which tries to adapt IRT for classical TC.
引用
收藏
页码:9423 / 9439
页数:16
相关论文
共 50 条
  • [32] Structure of competence: Need for theory-based methods to test theory-based questions - Response
    Greenspan, S
    McGrew, KS
    RESEARCH IN DEVELOPMENTAL DISABILITIES, 1996, 17 (02) : 145 - 152
  • [33] Keyword extraction strategy for item banks text categorization
    Nuntiyagul, Atorn
    Naruedomkul, Kanlaya
    Cercone, Nick
    Wongsawang, Damras
    COMPUTATIONAL INTELLIGENCE, 2007, 23 (01) : 28 - 44
  • [34] A theory-based approach to market transformation
    Blumstein, C
    Goldstone, S
    Lutzenhiser, L
    ENERGY POLICY, 2000, 28 (02) : 137 - 144
  • [35] PKIP: Feature selection in text categorization for item banks
    Nuntiyagul, A
    Naruedomkul, K
    Cercone, N
    Wongsawang, D
    ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 212 - 216
  • [36] A theory-based approach to pretesting advertising
    Percy, L
    Rossiter, JR
    MEASURING ADVERTISING EFFECTIVENESS, 1997, : 267 - 281
  • [38] Performance of depression rating scales in patients with chronic kidney disease: an item response theory-based analysis
    Toups, Marisa
    Carmody, Thomas
    Trivedi, Madhukar H.
    Rush, A. John
    Hedayati, S. Susan
    GENERAL HOSPITAL PSYCHIATRY, 2016, 42 : 60 - 66
  • [39] Recovering "lack of words" in text categorization for item banks
    Nuntiyagul, A
    Cercone, N
    Naruedomkul, K
    PROCEEDINGS OF THE 29TH ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE, WORKSHOPS AND FAST ABSTRACTS, 2005, : 31 - 32
  • [40] An Item Response Theory-Based Assessment of the Pain Assessment Checklist for Seniors With Limited Ability to Communicate (PACSLAC)
    Pannerden, Stephanie C. van Nispen Tot
    Candel, Math J. J. M.
    Zwakhalen, Sandra M. G.
    Hamers, Jan P. H.
    Curfs, Leopold M. G.
    Berger, Martijn P. F.
    JOURNAL OF PAIN, 2009, 10 (08): : 844 - 853