Politics and the German language: Testing Orwell's hypothesis using the Google N-Gram corpus

被引:4
|
作者
Caruana-Galizia, Paul [1 ]
机构
[1] Humboldt Univ, Inst Econ Hist, Spandauer Str 1, D-10178 Berlin, Germany
关键词
BOOKS;
D O I
10.1093/llc/fqv011
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Understanding the relationship between political regimes and language, while a popular theme among historians and linguists, is empirically difficult. This study suggests a preliminary empirical framework. A quantitative analysis of the Google Books 1-Gram German corpus from the year 1870 to 1945 provides empirical evidence consistent with George Orwell's hypothesis that everyday language deteriorates under dictatorships, through the inversion of words' underlying meanings. More specifically, this article shows that six non-technical non-Nazi words-Demokratie (democracy), Freiheit (freedom), Frieden (peace), Herrlichkeit (glory), Gerechtigkeit (justice), and Heldentumd (heroism)-are (1) highly correlated with explicitly Nazi words; (2) negatively correlated with Germany's level of democracy; and (3) negatively correlated with the count of riots, anti-government protests, and government crises, implying that these words were not used as a form of protest. The use of these words increased sharply under the Nazi government, which banned all publications that were critical of the government. These correlations cannot tell us whether the relationship is causal, and we cannot be sure whether the corpus under study is truly representative. Replicating this empirical framework on other corpora, pushing the period under study further back in time, or using 2-gram data sets can all help assess this study's findings.
引用
收藏
页码:441 / 456
页数:16
相关论文
共 41 条
  • [1] Using the Google N-Gram corpus to measure cultural complexity
    Juola, Patrick
    [J]. LITERARY AND LINGUISTIC COMPUTING, 2013, 28 (04): : 668 - 675
  • [2] Google N-Gram Viewer does not Include Arabic Corpus! Towards N-Gram Viewer for Arabic Corpus
    Alsmadi, Izzat
    Zarour, Mohammad
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (05) : 785 - 794
  • [3] Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models
    Yang, Yiben
    Wang, Ji-Ping
    Downey, Doug
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3268 - 3273
  • [4] A Corpus Based Unsupervised Bangla Word Stemming Using N-Gram Language Model
    Urmi, Tapashee Tabassum
    Jammy, Jasmine Jahan
    Ismail, Sabir
    [J]. 2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 824 - 828
  • [5] Feeling Analysis for Sadness and Happiness using Google n-gram Database
    Donmez, Ilknur
    Sonmez, Elena Battini
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 56 - 60
  • [6] Enriching Domain-Specific Language Models Using Domain Independent WWW N-Gram Corpus
    Chang, Harry
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 38 - 46
  • [7] N-GRAM FREQUENCY IN GERMAN LANGUAGE .2. SELECTED TRIGRAM
    SCHONPFLUG, W
    [J]. ZEITSCHRIFT FUR EXPERIMENTELLE UND ANGEWANDTE PSYCHOLOGIE, 1969, 16 (02): : 345 - +
  • [8] N-GRAM FREQUENCIES IN GERMAN LANGUAGE .3. SELECTED TETRAGRAMS
    SCHONPFLUG, W
    [J]. ZEITSCHRIFT FUR EXPERIMENTELLE UND ANGEWANDTE PSYCHOLOGIE, 1969, 16 (03): : 488 - +
  • [9] UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    [J]. 2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2011, : 857 - 860
  • [10] Oxymoron generation using an association word corpus and a large-scale N-gram corpus
    Yamane, Hiroaki
    Hagiwara, Masafumi
    [J]. SOFT COMPUTING, 2015, 19 (04) : 919 - 927