Using the Google N-Gram corpus to measure cultural complexity

被引:28
|
作者
Juola, Patrick [1 ]
机构
[1] Duquesne Univ, Pittsburgh, PA 15219 USA
来源
LITERARY AND LINGUISTIC COMPUTING | 2013年 / 28卷 / 04期
基金
美国国家科学基金会;
关键词
D O I
10.1093/llc/fqt017
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Empirical studies of broad-ranging aspects of culture, such as 'cultural complexities' are often extremely difficult. Following the model of Michel et al. (Michel, J.-B., Shen, Y. K., Aiden, A. P. et al. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331(6014): 176-82), and using a set of techniques originally developed to measure the complexity of language, we propose a text-based analysis of a large corpus of topic-uncontrolled text to determine how cultural complexity varies over time within a single culture. Using the Google Books American 2Gram corpus, we are able to show that (as predicted from the cumulative nature of culture), US culture has been steadily increasing in complexity, even when (for economic reasons) the amount of actual discourse as measured by publication volume decreases. We discuss several implication of this novel analysis technique as well as its implications for discussion of the meaning of 'culture.'
引用
收藏
页码:668 / 675
页数:8
相关论文
共 50 条
  • [1] Google N-Gram Viewer does not Include Arabic Corpus! Towards N-Gram Viewer for Arabic Corpus
    Alsmadi, Izzat
    Zarour, Mohammad
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (05) : 785 - 794
  • [2] Politics and the German language: Testing Orwell's hypothesis using the Google N-Gram corpus
    Caruana-Galizia, Paul
    [J]. DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2016, 31 (03) : 441 - 456
  • [3] Speech Corpus Generation Based on N-gram Confidence Measure Classification
    Koctur, Tomas
    Ondas, Stanislav
    Juhar, Jozef
    [J]. PROCEEDINGS OF 2017 INTERNATIONAL SYMPOSIUM ELMAR, 2017, : 149 - 152
  • [4] Feeling Analysis for Sadness and Happiness using Google n-gram Database
    Donmez, Ilknur
    Sonmez, Elena Battini
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 56 - 60
  • [5] Web as a Corpus: Going Beyond the n-gram
    Nakov, Preslav
    [J]. INFORMATION RETRIEVAL, RUSSIR 2014, 2015, 505 : 185 - 228
  • [6] Twitter n-gram corpus with demographic metadata
    Amaç Herdağdelen
    [J]. Language Resources and Evaluation, 2013, 47 : 1127 - 1147
  • [7] Twitter n-gram corpus with demographic metadata
    Herdagdelen, Amac
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (04) : 1127 - 1147
  • [8] N-gram approach for a URL Similarity Measure
    Singh, Neetu
    Chaudhari, Narendra S.
    [J]. 2016 1ST INDIA INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (IICIP), 2016,
  • [9] Oxymoron generation using an association word corpus and a large-scale N-gram corpus
    Yamane, Hiroaki
    Hagiwara, Masafumi
    [J]. SOFT COMPUTING, 2015, 19 (04) : 919 - 927
  • [10] Oxymoron generation using an association word corpus and a large-scale N-gram corpus
    Hiroaki Yamane
    Masafumi Hagiwara
    [J]. Soft Computing, 2015, 19 : 919 - 927