A Machine Learning-Based Readability Model for Gujarati Texts

被引：0

作者：

Bhogayata, Chandrakant K. ^{[1
]}

机构：

[1] Maharaja Krishnakumarsinhji Bhavnagar Univ, Bhavnagar, Gujarat, India

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2024年 / 23卷 / 02期

关键词：

Readability model; readability rating and level of education; interrater agreement; model comparison; Gujarati texts;

D O I：

10.1145/3637826

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study aims to develop a machine learning-based model to predict the readability of Gujarati texts. The dataset was 50 prose passages from Gujarati literature. Fourteen lexical and syntactic readability text features were extracted from the dataset using a machine learning algorithm of the unigram parts of speech tagger and three Python programming scripts. Two samples of native Gujarati speaking secondary and higher education students rated the Gujarati texts for readability judgment on a 10-point scale of "easy" to "difficult" with the interrater agreement. After dimensionality reduction, seven text features as the independent variables and the mean readability rating as the dependent variable were used to train the readability model. As the students' level of education and gender were related to their readability rating, four readability models for school students, university students, male students, and female students were trained with a backward stepwise multiple linear regression algorithm of supervised machine learning. The trained model is comparable across the raters' groups. The best model is the university students' readability rating model. The model is cross-validated. It explains 91% and 88% of the variance in readability ratings at training and cross-validation, respectively, and its effect size and power are large and high.

引用

页数：32

共 50 条

[1] A Machine Learning–Based Readability Model for Gujarati Texts
Near Valiya College, Flat No. 1, Hina Apartment, Vidyanagar, Gujarat State, Bhavnagar
364002, India
[J]. ACM Trans. Asian Low Res. Lang. Inf. Process., 2
[2] A Machine Learning-Based Model to Evaluate Readability and Assess Grade Level for the Web Pages
Pantula, Muralidhar
Kuppusamy, K. S.
[J]. COMPUTER JOURNAL, 2022, 65 (04): : 831 - 842
[3] A study of readability of texts in Bangla through machine learning approaches
Sinha M.
Basu A.
[J]. Education and Information Technologies, 2016, 21 (5) : 1071 - 1094
[4] Machine Learning-Based EDFA Gain Model
You, Yuren
Jiang, Zhiping
Janz, Christopher
[J]. 2018 EUROPEAN CONFERENCE ON OPTICAL COMMUNICATION (ECOC), 2018,
[5] Machine Learning-Based Recommendation Trust Model for Machine-to-Machine Communication
Eziama, Elvin
Jaimes, Luz M. S.
James, Agajo
Nwizege, Kenneth Sorle
Balador, Ali
Tepe, Kemal
[J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018,
[6] A Hybrid Machine Learning-Based Model for Indoor Propagation
Seretis, Aristeidis
Sarris, Costas D.
[J]. 2022 16TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION (EUCAP), 2022,
[7] On the interpretability of machine learning-based model for predicting hypertension
Elshawi, Radwa
Al-Mallah, Mouaz H.
Sakr, Sherif
[J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (1)
[8] A machine learning-based recommendation model for bipartite networks
Kart, Ozge
Ulucay, Oguzhan
Bingol, Berkay
Isik, Zerrin
[J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 553
[9] A machine learning-based model for the quantification of mental conflict
Naoki, Honda
Konaka, Yuki
[J]. NATURE COMPUTATIONAL SCIENCE, 2023, 3 (5): : 370 - 371
[10] A Machine Learning-Based Global Atmospheric Forecast Model
Szunyogh, Istvan
Arcomano, Troy
Pathak, Jaideep
Wikner, Alexander
Hunt, Brian
Ott, Edward
[J]. GEOPHYSICAL RESEARCH LETTERS, 2020, 47 (09)

← 1 2 3 4 5 →