IRText: An Item Response Theory-Based Approach for Text Categorization

被引:0
|
作者
Onder Coban
机构
[1] Adiyaman University,Department of Computer Engineering
关键词
Item response theory; Text categorization; Term weighting; Feature selection;
D O I
暂无
中图分类号
学科分类号
摘要
Text categorization (TC) is a machine learning task that tries to assign a text to one of the predefined categories. In a nutshell, texts are converted into numerical feature vectors in which each feature is bounded with a weight value. Afterward, a classifier is trained on vectorized texts and is used to classify previously unseen documents. Feature selection (FS) is also optionally applied to achieve better classification accuracy by using a lower number of features. Item response theory (IRT), on the other hand, is a set of statistical models designed to understand persons based on their responses to questions by assuming that responses on a given item are a function of both person and item properties. Even though there exist many studies devoted to understand, explore, and improve methods, there is not any previous study that aims at combining powers of these fields. As such, in this study, an IRT-based approach is proposed that suggests using the IRT score of a feature in both term weighting and FS that are important inter-steps of TC. The efficiency of the proposed approach is measured on two well-known benchmark datasets by comparing it with its two traditional peers. Experimental results show that the IRT-based approach can be used for text FS and there is open room for possible improvements. To the best of our knowledge, this study is the first of its kind which tries to adapt IRT for classical TC.
引用
收藏
页码:9423 / 9439
页数:16
相关论文
共 50 条
  • [1] IRText: An Item Response Theory-Based Approach for Text Categorization
    Coban, Onder
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (08) : 9423 - 9439
  • [2] Use of Different Variants of Item Response Theory-Based Feature Selection Method for Text Categorization
    Coban, Onder
    [J]. 2022 INTERNATIONAL CONFERENCE ON THEORETICAL AND APPLIED COMPUTER SCIENCE AND ENGINEERING (ICTASCE), 2022, : 66 - 71
  • [3] Demystifying theory-based categorization
    Ahn, WK
    Luhmann, CC
    [J]. BUILDING OBJECT CATEGORIES IN DEVELOPMENTAL TIME, 2005, : 277 - 300
  • [4] An item response theory-based pain item bank can enhance measurement precision
    Lai, JS
    Dineen, K
    Reeve, BB
    Von Roem, J
    Shervin, D
    McGuire, M
    Bode, RK
    Paice, J
    Cella, D
    [J]. JOURNAL OF PAIN AND SYMPTOM MANAGEMENT, 2005, 30 (03) : 278 - 288
  • [5] Can an item response theory-based pain item bank enhance measurement precision?
    Lai, JS
    Dineen, K
    Cella, D
    von Roenn, J
    [J]. CLINICAL THERAPEUTICS, 2003, 25 : D34 - D36
  • [6] An Item Factor Analysis and Item Response Theory-Based Revision of the Everyday Discrimination Scale
    Stucky, Brian D.
    Gottfredson, Nisha C.
    Panter, A. T.
    Daye, Charles E.
    Allen, Walter R.
    Wightman, Linda F.
    [J]. CULTURAL DIVERSITY & ETHNIC MINORITY PSYCHOLOGY, 2011, 17 (02): : 175 - 185
  • [7] Item Categorization Algorithm Based on Improved Text Representation
    Zhenchao T.
    Jing M.
    [J]. Data Analysis and Knowledge Discovery, 2022, 6 (05) : 34 - 43
  • [8] The use of an item response theory-based disability item bank across diseases: accounting for differential item functioning
    Weisscher, Nadine
    Glas, Cees A.
    Vermeulen, Marinus
    De Haan, Rob J.
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2010, 63 (05) : 543 - 549
  • [9] Theory-based categorization under speeded conditions
    Christian C. Luhmann
    Woo-Kyoung Ahn
    Thomas J. Palmeri
    [J]. Memory & Cognition, 2006, 34 : 1102 - 1111
  • [10] Theory-based categorization under speeded conditions
    Luhmann, Christlkn C.
    Ahn, Woo-Kyoung
    Palmeri, Thomas J.
    [J]. MEMORY & COGNITION, 2006, 34 (05) : 1102 - 1111