IRText: An Item Response Theory-Based Approach for Text Categorization

被引:0
|
作者
Onder Coban
机构
[1] Adiyaman University,Department of Computer Engineering
关键词
Item response theory; Text categorization; Term weighting; Feature selection;
D O I
暂无
中图分类号
学科分类号
摘要
Text categorization (TC) is a machine learning task that tries to assign a text to one of the predefined categories. In a nutshell, texts are converted into numerical feature vectors in which each feature is bounded with a weight value. Afterward, a classifier is trained on vectorized texts and is used to classify previously unseen documents. Feature selection (FS) is also optionally applied to achieve better classification accuracy by using a lower number of features. Item response theory (IRT), on the other hand, is a set of statistical models designed to understand persons based on their responses to questions by assuming that responses on a given item are a function of both person and item properties. Even though there exist many studies devoted to understand, explore, and improve methods, there is not any previous study that aims at combining powers of these fields. As such, in this study, an IRT-based approach is proposed that suggests using the IRT score of a feature in both term weighting and FS that are important inter-steps of TC. The efficiency of the proposed approach is measured on two well-known benchmark datasets by comparing it with its two traditional peers. Experimental results show that the IRT-based approach can be used for text FS and there is open room for possible improvements. To the best of our knowledge, this study is the first of its kind which tries to adapt IRT for classical TC.
引用
收藏
页码:9423 / 9439
页数:16
相关论文
共 50 条
  • [21] Development and testing of item response theory-based item banks and short forms for eye, skin and lung problems in sarcoidosis
    Victorson, David E.
    Choi, Seung
    Judson, Marc A.
    Cella, David
    QUALITY OF LIFE RESEARCH, 2014, 23 (04) : 1301 - 1313
  • [22] A fuzzy-based approach for text representation in text categorization
    Doan, S
    FUZZ-IEEE 2005: PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 1008 - 1013
  • [23] Item response theory approach to ethnocentrism
    Monaghan, Conal
    Bizumic, Boris
    FRONTIERS IN POLITICAL SCIENCE, 2023, 5
  • [24] Item Response Theory - A First Approach
    Nunes, Sandra
    Oliveira, Teresa
    Oliveira, Amilcar
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2016 (ICNAAM-2016), 2017, 1863
  • [25] A Spectral Approach to Item Response Theory
    Nguyen, Duc
    Zhang, Anderson Y.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [26] An Item Response Theory-Based Scoring of the South Oaks Gambling Screen-Revised Adolescents
    Anselmi, Pasquale
    Colledani, Daiana
    Andreotti, Alessandra
    Robusto, Egidio
    Fabbris, Luigi
    Vian, Paolo
    Genetti, Bruno
    Mortali, Claudia
    Minutillo, Adele
    Mastrobattista, Luisa
    Pacifici, Roberta
    ASSESSMENT, 2022, 29 (07) : 1381 - 1391
  • [27] Developing Item Response Theory-Based Short Forms to Measure the Social Impact of Burn Injuries
    Marino, Molly E.
    Dore, Emily C.
    Ni, Pengsheng
    Ryan, Colleen M.
    Schneider, Jeffrey C.
    Acton, Amy
    Jette, Alan M.
    Kazis, Lewis E.
    ARCHIVES OF PHYSICAL MEDICINE AND REHABILITATION, 2018, 99 (03): : 521 - 528
  • [28] Item response theory-based validation of a short form of the Eating Behavior Scale for Japanese adults
    Tayama, Jun
    Ogawa, Sayaka
    Takeoka, Atsushi
    Kobayashi, Masakazu
    Shirabe, Susumu
    MEDICINE, 2017, 96 (42)
  • [29] Ranking products through online opinions: A text analysis and regret theory-based approach
    Chen, Kejia
    Zheng, Jingjing
    Jin, Jian
    APPLIED SOFT COMPUTING, 2024, 158
  • [30] Text Categorization Based on Fuzzy Soft Set Theory
    Handaga, Bana
    Deris, Mustafa Mat
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2012, PT IV, 2012, 7336 : 340 - 352