Crowdsourcing the character of a place: Character-level convolutional networks for multilingual geographic text classification

被引:12
|
作者
Adams, Benjamin [1 ]
McKenzie, Grant [2 ]
机构
[1] Univ Canterbury, Dept Geog, Private Bag 4800, Christchurch 8020, New Zealand
[2] Univ Maryland, Dept Geog Sci, College Pk, MD 20742 USA
关键词
NEURAL-NETWORKS; WEB;
D O I
10.1111/tgis.12317
中图分类号
P9 [自然地理学]; K9 [地理];
学科分类号
0705 ; 070501 ;
摘要
This article presents a new character-level convolutional neural network model that can classify multilingual text written using any character set that can be encoded with UTF-8, a standard and widely used 8-bit character encoding. For geographic classification of text, we demonstrate that this approach is competitive with state-of-the-art word-based text classification methods. The model was tested on four crowdsourced data sets made up of Wikipedia articles, online travel blogs, Geonames toponyms, and Twitter posts. Unlike word-based methods, which require data cleaning and pre-processing, the proposed model works for any language without modification and with classification accuracy comparable to existing methods. Using a synthetic data set with introduced character-level errors, we show it is more robust to noise than word-level classification algorithms. The results indicate that UTF-8 character-level convolutional neural networks are a promising technique for georeferencing noisy text, such as found in colloquial social media posts and texts scanned with optical character recognition. However, word-based methods currently require less computation time to train, so currently are preferable for classifying well-formatted and cleaned texts in single languages.
引用
收藏
页码:394 / 408
页数:15
相关论文
共 50 条
  • [21] A Character-level Convolutional Neural Network with Dynamic Input Length for Thai Text Categorization
    Koomsubha, Thanabhat
    Vateekul, Peerapon
    [J]. 2017 9TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST), 2017, : 101 - 105
  • [22] Enhanced character-level deep convolutional neural networks for cardiovascular disease prediction
    Zhang, Zhichang
    Qiu, Yanlong
    Yang, Xiaoli
    Zhang, Minyu
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (Suppl 3)
  • [23] Enhanced character-level deep convolutional neural networks for cardiovascular disease prediction
    Zhichang Zhang
    Yanlong Qiu
    Xiaoli Yang
    Minyu Zhang
    [J]. BMC Medical Informatics and Decision Making, 20
  • [24] Application of the character-level statistical method in text categorization
    Yang, Zhen
    Nie, Xiangfei
    Xu, Weiran
    Guo, Jun
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 1412 - 1417
  • [25] An Efficient Character-Level and Word-Level Feature Fusion Method for Chinese Text Classification
    Jin Wenzhen
    Zhu Hong
    Yang Guocai
    [J]. 2019 3RD INTERNATIONAL CONFERENCE ON MACHINE VISION AND INFORMATION TECHNOLOGY (CMVIT 2019), 2019, 1229
  • [26] End-to-End Text Classification via Image-based Embedding using Character-level Networks
    Kitada, Shunsuke
    Kotani, Ryunosuke
    Iyatomi, Hitoshi
    [J]. 2018 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2018,
  • [27] A Novel Joint Character Categorization and Localization Approach for Character-Level Scene Text Recognition
    Qi, Xianbiao
    Chen, Yihao
    Xiao, Rong
    Li, Chun-Guang
    Zou, Qin
    Cui, Shuguang
    [J]. 2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 83 - 90
  • [28] Automatically Classifying Chinese Judgment Documents Using Character-Level Convolutional Neural Networks
    Zhou, Xiaosong
    Li, Chuanyi
    Ge, Jidong
    Li, Zhongjin
    Zhou, Xiaoyu
    Luo, Bin
    [J]. PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2018, 11013 : 430 - 437
  • [29] MOJI: Character-level convolutional neural networks for Malicious Obfuscated Java']JavaScript Inspection
    Ishida, Minato
    Kaneko, Naoshi
    Sumi, Kazuhiko
    [J]. APPLIED SOFT COMPUTING, 2023, 137
  • [30] Cyberbullying detection in social media text based on character-level convolutional neural network with shortcuts
    Lu, Nijia
    Wu, Guohua
    Zhang, Zhen
    Zheng, Yitao
    Ren, Yizhi
    Choo, Kim-Kwang Raymond
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (23):