Making choices in Russian: pros and cons of statistical methods for rival formsВыбор вариантных форм в русском языке: плюсы и минусы различных моделей статистического анализа

被引:0
|
作者
R. Harald Baayen
Anna Endresen
Laura A. Janda
Anastasia Makarova
Tore Nesset
机构
[1] University of Tübingen,
[2] University of Tromsø,undefined
关键词
Random Forest; Classification Tree; Variable Importance; Forest Model; Rival Form;
D O I
10.1007/s11185-013-9118-6
中图分类号
学科分类号
摘要
Sometimes languages present speakers with choices among rival forms, such as the Russian forms ostrič’ vs. obstrič’ ‘cut hair’ and proniknuv vs. pronikši ‘having penetrated’. The choice of a given form is often influenced by various considerations involving the meaning and the environment (syntax, morphology, phonology). Understanding the behavior of rival forms is crucial to understanding the form-meaning relationship of language, yet this topic has not received as much attention as it deserves. Given the variety of factors that can influence the choice of rival forms, it is necessary to use statistical models in order to accurately discover which factors are significant and to what extent. The traditional model for this kind of data is logistical regression, but recently two new models, called ‘tree & forest’ and ‘naive discriminative learning’ have emerged as alternatives. We compare the performance of logistical regression against the two new models on the basis of four datasets reflecting rival forms in Russian. We find that the three models generally provide converging analyses, with complementary advantages. After identifying the significant factors for each dataset, we show that different sets of rival forms occupy different regions in a space defined by variance in meaning and environment.
引用
收藏
页码:253 / 291
页数:38
相关论文
共 50 条