A Machine Learning-Based Readability Model for Gujarati Texts

被引:0
|
作者
Bhogayata, Chandrakant K. [1 ]
机构
[1] Maharaja Krishnakumarsinhji Bhavnagar Univ, Bhavnagar, Gujarat, India
关键词
Readability model; readability rating and level of education; interrater agreement; model comparison; Gujarati texts;
D O I
10.1145/3637826
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study aims to develop a machine learning-based model to predict the readability of Gujarati texts. The dataset was 50 prose passages from Gujarati literature. Fourteen lexical and syntactic readability text features were extracted from the dataset using a machine learning algorithm of the unigram parts of speech tagger and three Python programming scripts. Two samples of native Gujarati speaking secondary and higher education students rated the Gujarati texts for readability judgment on a 10-point scale of "easy" to "difficult" with the interrater agreement. After dimensionality reduction, seven text features as the independent variables and the mean readability rating as the dependent variable were used to train the readability model. As the students' level of education and gender were related to their readability rating, four readability models for school students, university students, male students, and female students were trained with a backward stepwise multiple linear regression algorithm of supervised machine learning. The trained model is comparable across the raters' groups. The best model is the university students' readability rating model. The model is cross-validated. It explains 91% and 88% of the variance in readability ratings at training and cross-validation, respectively, and its effect size and power are large and high.
引用
收藏
页数:32
相关论文
共 50 条
  • [1] A Machine Learning–Based Readability Model for Gujarati Texts
    Near Valiya College, Flat No. 1, Hina Apartment, Vidyanagar, Gujarat State, Bhavnagar
    364002, India
    [J]. ACM Trans. Asian Low Res. Lang. Inf. Process., 2
  • [2] A Machine Learning-Based Model to Evaluate Readability and Assess Grade Level for the Web Pages
    Pantula, Muralidhar
    Kuppusamy, K. S.
    [J]. COMPUTER JOURNAL, 2022, 65 (04): : 831 - 842
  • [3] A study of readability of texts in Bangla through machine learning approaches
    Sinha M.
    Basu A.
    [J]. Education and Information Technologies, 2016, 21 (5) : 1071 - 1094
  • [4] Machine Learning-Based EDFA Gain Model
    You, Yuren
    Jiang, Zhiping
    Janz, Christopher
    [J]. 2018 EUROPEAN CONFERENCE ON OPTICAL COMMUNICATION (ECOC), 2018,
  • [5] Machine Learning-Based Recommendation Trust Model for Machine-to-Machine Communication
    Eziama, Elvin
    Jaimes, Luz M. S.
    James, Agajo
    Nwizege, Kenneth Sorle
    Balador, Ali
    Tepe, Kemal
    [J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018,
  • [6] A Hybrid Machine Learning-Based Model for Indoor Propagation
    Seretis, Aristeidis
    Sarris, Costas D.
    [J]. 2022 16TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION (EUCAP), 2022,
  • [7] On the interpretability of machine learning-based model for predicting hypertension
    Elshawi, Radwa
    Al-Mallah, Mouaz H.
    Sakr, Sherif
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (1)
  • [8] A machine learning-based recommendation model for bipartite networks
    Kart, Ozge
    Ulucay, Oguzhan
    Bingol, Berkay
    Isik, Zerrin
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 553
  • [9] A machine learning-based model for the quantification of mental conflict
    Naoki, Honda
    Konaka, Yuki
    [J]. NATURE COMPUTATIONAL SCIENCE, 2023, 3 (5): : 370 - 371
  • [10] A Machine Learning-Based Global Atmospheric Forecast Model
    Szunyogh, Istvan
    Arcomano, Troy
    Pathak, Jaideep
    Wikner, Alexander
    Hunt, Brian
    Ott, Edward
    [J]. GEOPHYSICAL RESEARCH LETTERS, 2020, 47 (09)