Automated labeling of PDF mathematical exercises with word N-grams VSM classification

被引:0
|
作者
Yamauchi, Taisei [1 ]
Flanagan, Brendan [2 ]
Nakamoto, Ryosuke [1 ]
Dai, Yiling [3 ]
Takami, Kyosuke [4 ]
Ogata, Hiroaki [3 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
[2] Kyoto Univ, Inst Liberal Arts & Sci, Ctr Innovat Res & Educ Data Sci, Kyoto, Japan
[3] Kyoto Univ, Acad Ctr Comp & Media Studies, Kyoto, Japan
[4] Natl Inst Educ Policy Res, Educ Data Sci Ctr, Tokyo, Japan
关键词
Automatic labeling; Word n-gram; Random forest; Incomplete text classification; Word embedding; Mathematical education; Mathematical education in Japan; LANGUAGE; RECOGNITION; ANALYTICS;
D O I
10.1186/s40561-023-00271-9
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
In recent years, smart learning environments have become central to modern education and support students and instructors through tools based on prediction and recommendation models. These methods often use learning material metadata, such as the knowledge contained in an exercise which is usually labeled by domain experts and is costly and difficult to scale. It recognizes that automated labeling eases the workload on experts, as seen in previous studies using automatic classification algorithms for research papers and Japanese mathematical exercises. However, these studies didn't delve into fine-grained labeling. In addition to that, as the use of materials in the system becomes more widespread, paper materials are transformed into PDF formats, which can lead to incomplete extraction. However, there is less emphasis on labeling incomplete mathematical sentences to tackle this problem in the previous research. This study aims to achieve precise automated classification even from incomplete text inputs. To tackle these challenges, we propose a mathematical exercise labeling algorithm that can handle detailed labels, even for incomplete sentences, using word n-grams, compared to the state-of-the-art word embedding method. The results of the experiment show that mono-gram features with Random Forest models achieved the best performance with a macro F-measure of 92.50%, 61.28% for 24-class labeling and 297-class labeling tasks, respectively. The contribution of this research is showing that the proposed method based on traditional simple n-grams has the ability to find context-independent similarities in incomplete sentences and outperforms state-of-the-art word embedding methods in specific tasks like classifying short and incomplete texts.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Automated labeling of PDF mathematical exercises with word N-grams VSM classification
    Taisei Yamauchi
    Brendan Flanagan
    Ryosuke Nakamoto
    Yiling Dai
    Kyosuke Takami
    Hiroaki Ogata
    [J]. Smart Learning Environments, 10
  • [2] IDF for Word N-grams
    Shirakawa, Masumi
    Hara, Takahiro
    Nishio, Shojiro
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2017, 36 (01)
  • [3] Using Word N-Grams as Features in Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Alhoshan, Muneera
    Hazzaa, Itisam
    [J]. SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2015, 569 : 35 - 43
  • [4] Algorithm for Updating n-Grams Word Dictionary for Web Classification
    Abidin, Taufik Fuadi
    Ferdhiana, Ridha
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTING (ICIC), 2016, : 432 - 436
  • [5] SPEECH RECOGNITION USING FUNCTION-WORD N-GRAMS AND CONTENT-WORD N-GRAMS
    ISOTANI, R
    MATSUNAGA, S
    SAGAYAMA, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 692 - 697
  • [6] The subjective frequency of word n-grams
    Shaoul, Cyrus
    Westbury, Chris F.
    Baayen, R. Harald
    [J]. PSIHOLOGIJA, 2013, 46 (04) : 497 - 537
  • [7] Variable word rate n-grams
    Gotoh, Y
    Renals, S
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1591 - 1594
  • [8] Which Granularity to Bootstrap a Multilingual Method of Document Alignment: Character N-grams or Word N-grams?
    Lecluze, Charlotte
    Rigouste, Lois
    Giguet, Emmanuel
    Lucas, Nadine
    [J]. CORPUS RESOURCES FOR DESCRIPTIVE AND APPLIED STUDIES. CURRENT CHALLENGES AND FUTURE DIRECTIONS: SELECTED PAPERS FROM THE 5TH INTERNATIONAL CONFERENCE ON CORPUS LINGUISTICS (CILC2013), 2013, 95 : 473 - 481
  • [9] Pixel N-grams for mammographic lesion classification
    Kulkarni, Pradnya
    Stranieri, Andrew
    Ugon, Julien
    Mittal, Manish
    Kulkarni, Siddhivinayak
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS, COMPUTING AND IT APPLICATIONS (CSCITA), 2017, : 107 - 111
  • [10] Comparing Pixel N-grams and Bag of Visual Word Features for the Classification of Diabetic Retinopathy
    Kulkarni, Pradnya
    Stranieri, Andrew
    Jelinek, Herbert
    [J]. PROCEEDINGS OF THE AUSTRALASIAN COMPUTER SCIENCE WEEK MULTICONFERENCE (ACSW 2019), 2019,