Using the Google N-Gram corpus to measure cultural complexity

被引：28

作者：

Juola, Patrick ^{[1
]}

机构：

[1] Duquesne Univ, Pittsburgh, PA 15219 USA

来源：

LITERARY AND LINGUISTIC COMPUTING | 2013年 / 28卷 / 04期

基金：

美国国家科学基金会;

关键词：

D O I：

10.1093/llc/fqt017

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

Empirical studies of broad-ranging aspects of culture, such as 'cultural complexities' are often extremely difficult. Following the model of Michel et al. (Michel, J.-B., Shen, Y. K., Aiden, A. P. et al. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331(6014): 176-82), and using a set of techniques originally developed to measure the complexity of language, we propose a text-based analysis of a large corpus of topic-uncontrolled text to determine how cultural complexity varies over time within a single culture. Using the Google Books American 2Gram corpus, we are able to show that (as predicted from the cumulative nature of culture), US culture has been steadily increasing in complexity, even when (for economic reasons) the amount of actual discourse as measured by publication volume decreases. We discuss several implication of this novel analysis technique as well as its implications for discussion of the meaning of 'culture.'

引用

页码：668 / 675

页数：8

共 50 条

[1] Google N-Gram Viewer does not Include Arabic Corpus! Towards N-Gram Viewer for Arabic Corpus
Alsmadi, Izzat
Zarour, Mohammad
[J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (05) : 785 - 794
[2] Politics and the German language: Testing Orwell's hypothesis using the Google N-Gram corpus
Caruana-Galizia, Paul
[J]. DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2016, 31 (03) : 441 - 456
[3] Speech Corpus Generation Based on N-gram Confidence Measure Classification
Koctur, Tomas
Ondas, Stanislav
Juhar, Jozef
[J]. PROCEEDINGS OF 2017 INTERNATIONAL SYMPOSIUM ELMAR, 2017, : 149 - 152
[4] Feeling Analysis for Sadness and Happiness using Google n-gram Database
Donmez, Ilknur
Sonmez, Elena Battini
[J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 56 - 60
[5] Web as a Corpus: Going Beyond the n-gram
Nakov, Preslav
[J]. INFORMATION RETRIEVAL, RUSSIR 2014, 2015, 505 : 185 - 228
[6] Twitter n-gram corpus with demographic metadata
Amaç Herdağdelen
[J]. Language Resources and Evaluation, 2013, 47 : 1127 - 1147
[7] Twitter n-gram corpus with demographic metadata
Herdagdelen, Amac
[J]. LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (04) : 1127 - 1147
[8] N-gram approach for a URL Similarity Measure
Singh, Neetu
Chaudhari, Narendra S.
[J]. 2016 1ST INDIA INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (IICIP), 2016,
[9] Oxymoron generation using an association word corpus and a large-scale N-gram corpus
Yamane, Hiroaki
Hagiwara, Masafumi
[J]. SOFT COMPUTING, 2015, 19 (04) : 919 - 927
[10] Oxymoron generation using an association word corpus and a large-scale N-gram corpus
Hiroaki Yamane
Masafumi Hagiwara
[J]. Soft Computing, 2015, 19 : 919 - 927

← 1 2 3 4 5 →