Crowdsourcing the character of a place: Character-level convolutional networks for multilingual geographic text classification

被引:11
|
作者
Adams, Benjamin [1 ]
McKenzie, Grant [2 ]
机构
[1] Univ Canterbury, Dept Geog, Private Bag 4800, Christchurch 8020, New Zealand
[2] Univ Maryland, Dept Geog Sci, College Pk, MD 20742 USA
关键词
NEURAL-NETWORKS; WEB;
D O I
10.1111/tgis.12317
中图分类号
P9 [自然地理学]; K9 [地理];
学科分类号
0705 ; 070501 ;
摘要
This article presents a new character-level convolutional neural network model that can classify multilingual text written using any character set that can be encoded with UTF-8, a standard and widely used 8-bit character encoding. For geographic classification of text, we demonstrate that this approach is competitive with state-of-the-art word-based text classification methods. The model was tested on four crowdsourced data sets made up of Wikipedia articles, online travel blogs, Geonames toponyms, and Twitter posts. Unlike word-based methods, which require data cleaning and pre-processing, the proposed model works for any language without modification and with classification accuracy comparable to existing methods. Using a synthetic data set with introduced character-level errors, we show it is more robust to noise than word-level classification algorithms. The results indicate that UTF-8 character-level convolutional neural networks are a promising technique for georeferencing noisy text, such as found in colloquial social media posts and texts scanned with optical character recognition. However, word-based methods currently require less computation time to train, so currently are preferable for classifying well-formatted and cleaned texts in single languages.
引用
收藏
页码:394 / 408
页数:15
相关论文
共 50 条
  • [1] Character-level Convolutional Networks for Text Classification
    Zhang, Xiang
    Zhao, Junbo
    Yann Lecun
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [2] Joint Character-Level Convolutional and Generative Adversarial Networks for Text Classification
    Wang, Tianshi
    Liu, Li
    Zhang, Huaxiang
    Zhang, Long
    Chen, Xiuxiu
    [J]. COMPLEXITY, 2020, 2020
  • [3] Weakly-supervised character-level convolutional neural networks for text classification
    Liu, Yongsheng
    Chen, Wenyu
    Niyongabo, Rubungo Andre
    Qu, Hong
    Miao, Kebin
    Wei, Feng
    [J]. DEVELOPMENTS OF ARTIFICIAL INTELLIGENCE TECHNOLOGIES IN COMPUTATION AND ROBOTICS, 2020, 12 : 701 - 708
  • [4] Character-Level Attention Convolutional Neural Networks for Short-Text Classification
    Yin, Feiyang
    Yao, Zhilin
    Liu, Jia
    [J]. HUMAN CENTERED COMPUTING, 2019, 11956 : 560 - 567
  • [5] Character-level Neural Networks for Short Text Classification
    Liu, Jingxue
    Meng, Fanrong
    Zhou, Yong
    Liu, Bing
    [J]. 2017 INTERNATIONAL SMART CITIES CONFERENCE (ISC2), 2017,
  • [6] Text Classification and Transfer Learning Based on Character-Level Deep Convolutional Neural Networks
    Sato, Minato
    Orihara, Ryohei
    Sei, Yuichi
    Tahara, Yasuyuki
    Ohsuga, Akihiko
    [J]. AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART 2017), 2018, 10839 : 62 - 81
  • [7] A Complaint Text Classification Model Based on Character-level Convolutional Network
    Tong, Xuesong
    Wu, Bin
    Wang, Shuyang
    Lv, Jinna
    [J]. PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 507 - 511
  • [8] Character-level convolutional networks for arithmetic operator character recognition
    Liang, Zhijie
    Li, Qing
    Liao, Shengbin
    [J]. FIFTH INTERNATIONAL CONFERENCE ON EDUCATIONAL INNOVATION THROUGH TECHNOLOGY (EITT 2016), 2016, : 208 - 212
  • [9] Character-level text classification via convolutional neural network and gated recurrent unit
    Bing Liu
    Yong Zhou
    Wei Sun
    [J]. International Journal of Machine Learning and Cybernetics, 2020, 11 : 1939 - 1949
  • [10] A Character-level Short Text Classification Model Based On Spiking Neural Networks
    Jiang, Chengzhi
    Li, Linjing
    Zeng, Daniel Dajun
    Wang, Xiaoxuan
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,