Question classification based on Bloom's taxonomy cognitive domain using modified TF-IDF and word2vec

被引：44

作者：

Mohammed, Manal ^{[1
,2
]}

Omar, Nazlia ^{[1
]}

机构：

[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, CAIT, Bangi, Selangor, Malaysia

[2] Hadhramout Univ, Fac Adm Sci, Management Informat Syst Dept, Al Mukalla, Yemen

来源：

PLOS ONE | 2020年 / 15卷 / 03期

关键词：

D O I：

10.1371/journal.pone.0230442

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The assessment of examination questions is crucial in educational institutes since examination is one of the most common methods to evaluate students' achievement in specific course. Therefore, there is a crucial need to construct a balanced and high-quality exam, which satisfies different cognitive levels. Thus, many lecturers rely on Bloom's taxonomy cognitive domain, which is a popular framework developed for the purpose of assessing students' intellectual abilities and skills. Several works have been proposed to automatically handle the classification of questions in accordance with Bloom's taxonomy. Most of these works classify questions according to specific domain. As a result, there is a lack of technique of classifying questions that belong to the multi-domain areas. The aim of this paper is to present a classification model to classify exam questions based on Bloom's taxonomy that belong to several areas. This study proposes a method for classifying questions automatically, by extracting two features, TFPOS-IDF and word2vec. The purpose of the first feature was to calculate the term frequency-inverse document frequency based on part of speech, in order to assign a suitable weight for essential words in the question. The second feature, pre-trained word2vec, was used to boost the classification process. Then, the combination of these features was fed into three different classifiers; K-Nearest Neighbour, Logistic Regression, and Support Vector Machine, in order to classify the questions. The experiments used two datasets. The first dataset contained 141 questions, while the other dataset contained 600 questions. The classification result for the first dataset achieved an average of 71.1%, 82.3% and 83.7% weighted F1-measure respectively. The classification result for the second dataset achieved an average of 85.4%, 89.4% and 89.7% weighted F1-measure respectively. The finding from this study showed that the proposed method is significant in classifying questions from multiple domains based on Bloom's taxonomy.

引用

页数：21

共 50 条

[21] Research on Chinese Text Classification Based on Word2vec
Yang, Zhi-Tong
Zheng, Jun
[J]. 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1166 - 1170
[22] Chinese Sentiment Classification Using Extended Word2Vec
张胜
张鑫
程佳军
王晖
[J]. Journal of Donghua University(English Edition), 2016, 33 (05) : 823 - 826
[23] Microblogging Short Text Classification based on Word2Vec
Zhang, Yonghui
Liu, Jingang
[J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ELECTRONIC, MECHANICAL, INFORMATION AND MANAGEMENT SOCIETY (EMIM), 2016, 40 : 395 - 401
[24] Short Text Classification Based on Wikipedia and Word2vec
Liu Wensen
Cao Zewen
Wang Jun
Wang Xiaoyi
[J]. 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1195 - 1200
[25] KEYWORD EXTRACTION BASED ON WORD SYNONYMS USING WORD2VEC
Ogul, Iskender Ulgen
Ozcan, Caner
Hakdagli, Ozlem
[J]. 2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
[26] Text Classification Based on Word2vec and Convolutional Neural Network
Li, Lin
Xiao, Linlong
Jin, Wenzhen
Zhu, Hong
Yang, Guocai
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 450 - 460
[27] Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec
Kim, Donghwa
Seo, Deokseong
Cho, Suhyoun
Kang, Pilsung
[J]. INFORMATION SCIENCES, 2019, 477 : 15 - 29
[28] Turkish Document Classification Based on Word2Vec and SVM Classifier
Sahin, Gurkan
[J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
[29] Text Classification Research Based on Improved Word2vec and CNN
Gao, Mengyuan
Li, Tinghui
Huang, Peifang
[J]. SERVICE-ORIENTED COMPUTING, ICSOC 2018, 2019, 11434 : 126 - 135
[30] Diet Health Text Classification Based on word2vec and LSTM
Zhao, Ming
Du, Huifang
Dong, Cuicui
Chen, Changsong
[J]. Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2017, 48 (10): : 202 - 208

← 1 2 3 4 5 →