Automatic classification of academic documents using text mining techniques

被引:0
|
作者
Nunez, Haydemar [1 ]
Ramos, Esmeralda [1 ]
机构
[1] Cent Univ Venezuela, Fac Ciencias, Escuela Computac, Lab Inteligencia Artificial, Caracas, Venezuela
关键词
Text mining; classification models; K nearest; neighbor algorithm; documents categorization;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this work an automatic classifier of undergraduate final projects based on text mining is presented. The dataset, comprising documents from four professional categories, was represented by means the vector space model with different index metrics. Also, a number of techniques for reduction dimensionality were applied over the word space. In order to construct the classification model the K-nearest neighbor algorithm was applied. Using 10-fold cross-validations we could obtain 82% of predictive accuracy. However, we achieved an accuracy of 95% with a recommendation of up to two categories taking into account the interdisciplinary in documents. This classifier was integrated into an application for automatic assignment of reviewers, which performs this assignation from teachers who belong to the areas recommended.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Automatic Content Analysis of Legislative Documents by Text Mining Techniques
    Lin, Fu-Ren
    Chou, Shih-Yao
    Liao, Dachi
    Hao, De
    [J]. 2015 48TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2015, : 2199 - 2208
  • [2] Text mining in the classification of digital documents
    Contreras Barrera, Marcial
    [J]. BIBLIOS-REVISTA DE BIBLIOTECOLOGIA Y CIENCIAS DE LA INFORMACION, 2016, (64): : 33 - 43
  • [3] Text mining and machine learning for crime classification: using unstructured narrative court documents in police academic
    Bifari, Ezdihar
    Basbrain, Arwa
    Mirza, Rsha
    Bafail, Alaa
    Albaeadie, Somayah
    Alhalabi, Wadee
    [J]. COGENT ENGINEERING, 2024, 11 (01):
  • [4] Automated Operations Classification using Text Mining Techniques
    Esmael, Bilal
    Arnaout, Mohammad Arghad
    Fruhwirth, Rudolf K.
    Thonhauser, Gerhard
    [J]. 2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL V, 2010, : 235 - 238
  • [5] Arabic dialects classification using text mining techniques
    AL-Walaie, Mona Abdullah
    Khan, Muhammad Badruddin
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER AND APPLICATIONS (ICCA), 2017, : 325 - 329
  • [6] Automatic Lyrics Classification System Using Text Mining Technique
    Jareanpon, Chatklaw
    Kiatjindarat, Waranyoo
    Polhome, Thanawat
    Khongkraphan, Kittiya
    [J]. 2018 INTERNATIONAL WORKSHOP ON ADVANCED IMAGE TECHNOLOGY (IWAIT), 2018,
  • [7] Using Automatic Features for Text-image Classification in Amharic Documents
    Belay, Birhanu
    Habtegebrial, Tewodros
    Belay, Gebeyehu
    Stricker, Didier
    [J]. ICPRAM: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2020, : 440 - 445
  • [8] Extracting Body Text from Academic PDF Documents for Text Mining
    Yu, Changfeng
    Zhang, Cheng
    Wang, Jie
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1, 2020, : 235 - 242
  • [9] Deep Text Mining for Automatic Keyphrase Extraction from Text Documents
    Abulaish, Muhammad
    Jahiruddin
    Dey, Lipika
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2011, 20 (04) : 327 - 351
  • [10] A SURVEY ON CLASSIFICATION TECHNIQUES FOR TEXT MINING
    Brindha, S.
    Sukumaran, S.
    Prabha, K.
    [J]. 2016 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2016,