Mapping Lexical Dialect Variation in British English Using Twitter

被引:29
|
作者
Grieve, Jack [1 ]
Montgomery, Chris [2 ]
Nini, Andrea [3 ]
Murakami, Akira [1 ]
Guo, Diansheng [4 ]
机构
[1] Univ Birmingham, Dept English Language & Linguist, Birmingham, W Midlands, England
[2] Univ Sheffield, Sch English, Sheffield, S Yorkshire, England
[3] Univ Manchester, Dept Linguist & English Language, Manchester, Lancs, England
[4] Univ South Carolina, Dept Geog, Columbia, SC 29208 USA
来源
基金
英国经济与社会研究理事会; 英国艺术与人文研究理事会;
关键词
dialectology; social media; Twitter; British English; big data; lexical variation; spatial analysis; sociolinguistics;
D O I
10.3389/frai.2019.00011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a growing trend in regional dialectology to analyse large corpora of social media data, but it is unclear if the results of these studies can be generalized to language as a whole. To assess the generalizability of Twitter dialect maps, this paper presents the first systematic comparison of regional lexical variation in Twitter corpora and traditional survey data. We compare the regional patterns found in 139 lexical dialect maps based on a 1.8 billion word corpus of geolocated UK Twitter data and the BBC Voices dialect survey. A spatial analysis of these 139 map pairs finds a broad alignment between these two data sources, offering evidence that both approaches to data collection allow for the same basic underlying regional patterns to be identified. We argue that these results license the use of Twitter corpora for general inquiries into regional lexical variation and change.
引用
收藏
页数:18
相关论文
共 50 条