Using character N-grams to explore diachronic change in medieval English

被引:1
|
作者
Buckley, Kevin [1 ]
Vogel, Carl [2 ]
机构
[1] Newcastle Univ, Sch Modern Languages, Newcastle Upon Tyne, Tyne & Wear, England
[2] Univ Dublin, Sch Comp Sci & Stat, Trinity Ctr Comp & Language Studies, Computat Linguist Grp,Trinity Coll Dublin, Dublin, Ireland
基金
爱尔兰科学基金会;
关键词
computational linguistics; history of English; language contact; diachronic linguistics; character N-grams;
D O I
10.1515/flih-2019-0012
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper applies character N-grams to the study of diachronic linguistic variation in a historical language. The period selected for this initial exploratory study is medieval English, a well-studied period of great linguistic variation and language contact, whereby the efficacy of computational techniques can be examined through comparison to the wealth of thorough scholarship on medieval linguistic variation. Frequency profiles of character N-gram features were generated for several epochs in the history of English and a measure of language distance was employed to quantify the similarity between English at different stages in its history. Through this a quantification of internal change in English was achieved. Furthermore similarity between English and other medieval languages across time was measured allowing for a measurement of the well-known period of contact between English and Anglo-Norman French. This methodology is compared to traditional lexicostatistical methods and shown to be able to derive the same patterns as those derived from expert-created feature lists (i.e. Swadesh lists). The use of character N-gram profiles proved to be a flexible and useful method to study diachronic variation, allowing for the highlighting of relevant features of change. This method may be a complement to traditional qualitative examinations.
引用
收藏
页码:249 / 299
页数:51
相关论文
共 50 条
  • [1] Spam detection using character N-grams
    Kanaris, Ioannis
    Kanaris, Konstantinos
    Stamatatos, Efstathios
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 95 - 104
  • [2] Authorship Attribution in Portuguese Using Character N-grams
    Markov, Ilia
    Baptista, Jorge
    Pichardo-Lagunas, Obdulia
    [J]. ACTA POLYTECHNICA HUNGARICA, 2017, 14 (03) : 59 - 78
  • [3] A first approach to CLIR using character n-grams alignment
    Vilares, Jesus
    Oakes, Michael P.
    Tait, John I.
    [J]. EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 111 - +
  • [4] Detection of Opinion Spam with Character n-grams
    Hernandez Fusilier, Donato
    Montes-y-Gomez, Manuel
    Rosso, Paolo
    Guzman Cabrera, Rafael
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 285 - 294
  • [5] Which Granularity to Bootstrap a Multilingual Method of Document Alignment: Character N-grams or Word N-grams?
    Lecluze, Charlotte
    Rigouste, Lois
    Giguet, Emmanuel
    Lucas, Nadine
    [J]. CORPUS RESOURCES FOR DESCRIPTIVE AND APPLIED STUDIES. CURRENT CHALLENGES AND FUTURE DIRECTIONS: SELECTED PAPERS FROM THE 5TH INTERNATIONAL CONFERENCE ON CORPUS LINGUISTICS (CILC2013), 2013, 95 : 473 - 481
  • [6] Predicting Political Donations Using Twitter Hashtags and Character N-Grams
    Conrad, Colin
    Keselj, Vlado
    [J]. 2016 IEEE 18TH CONFERENCE ON BUSINESS INFORMATICS (CBI), VOL. 2, 2016, : 1 - 7
  • [7] Author Assertion of Furtive Write Print Using Character N-Grams
    Hassan, Feryal H.
    Chaurasia, Mousmi A.
    [J]. FUTURE INFORMATION TECHNOLOGY, 2011, 13 : 274 - 278
  • [8] Unconstrained Offline Handwriting Recognition using Connectionist Character N-grams
    Zamora-Martinez, F.
    Castro-Bleda, M. J.
    Espana-Boquera, S.
    Gorbe-Moya, J.
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [9] Handwritten address recognition with open vocabulary using character n-grams
    Brakensiek, A
    Rottland, J
    Rigoll, G
    [J]. EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, : 357 - 362
  • [10] Feature selection on Chinese text classification using character n-grams
    Wei, Zhihua
    Miao, Duoqian
    Chauchat, Jean-Hugues
    Zhong, Caiming
    [J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2008, 5009 : 500 - +