Text readability, complexity metrics and the importance of words

被引:4
|
作者
Lopez-Anguita, Roco [1 ]
Montejo-Raez, Arturo [1 ]
Martinez-Santiago, Fernando J. [1 ]
Carlos Diaz-Galiano, Manuel [1 ]
机构
[1] Univ Jaen, Ctr Estudios Avanzados TIC, Jaen, Spain
来源
关键词
Readability; text complexity; language modelling;
D O I
10.26342/2018-61-11
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This article describes our study on the identification of the recommended age for readers in texts written for children. They have been evaluated over 12 complexity metrics proposed by different authors. By using these metrics as features, we have trained several automatic classifiers and cross-validated their performances to detect recommended reader level. The results have been compared with the classification performance obtained from other document models, like word embeddings and TF.IDF vectors. Our conclusiones are that the most relevant facet to identify the recommended reader age is not on lexical or syntactical complexities, but strongly related with the vocabulary involved.
引用
收藏
页码:101 / 108
页数:8
相关论文
共 50 条