A convolutional neural network approach for gender and language variety identification

被引:7
|
作者
Gomez-Adorno, Helena [1 ]
Fuentes-Alba, Roddy [2 ]
Markov, Ilia [3 ]
Sidorov, Grigori [2 ]
Gelbukh, Alexander [2 ]
机构
[1] Univ Nacl Autonoma Mexico, Inst Invest Matemdt Aplicadas & Sistemas IIMAS, Mexico City, DF, Mexico
[2] Inst Politecn Nacl, CIC, Mexico City, DF, Mexico
[3] INRIA, Le Chesnay, France
关键词
Convolutional neural networks; deep learning; author profiling; gender identification; language variety identification; machine learning; character n-grams; Spanish;
D O I
10.3233/JIFS-179032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for gender and language variety identification using a convolutional neural network (CNN). We compare the performance of this method with a traditional machine learning algorithm - support vector machines (SVM) trained on character n-grams (n = 3-8) and lexical features (unigrams and bigrams of words), and their combinations. We use a single multi-labeled corpus composed of news articles in different varieties of Spanish developed specifically for these tasks. We present a convolutional neural network trained on word- and sentence-level embeddings architecture that can be successfully applied to gender and language variety identification on a relatively small corpus (less than 10,000 documents). Our experiments show that the deep learning approach outperforms a traditional machine learning approach on both tasks, when named entities are present in the corpus. However, when evaluating the performance of these approaches reducing all named entities to a single symbol "NE" to avoid topic-dependent features, the drop in accuracy is higher for the deep learning approach.
引用
收藏
页码:4845 / 4855
页数:11
相关论文
共 50 条
  • [41] Verification of Convolutional Neural Network Cephalometric Landmark Identification
    Davidovitch, Moshe
    Sella-Tunis, Tatiana
    Abramovicz, Liat
    Reiter, Shoshana
    Matalon, Shlomo
    Shpack, Nir
    APPLIED SCIENCES-BASEL, 2022, 12 (24):
  • [42] Vortex Boundary Identification using Convolutional Neural Network
    Berenjkoub, Marzieh
    Chen, Guoning
    Gunther, Tobias
    2020 IEEE VISUALIZATION CONFERENCE - SHORT PAPERS (VIS 2020), 2020, : 261 - 265
  • [43] Skin Identification Using Deep Convolutional Neural Network
    Oghaz, Mahdi Maktab Dar
    Argyriou, Vasileios
    Monekosso, Dorothy
    Remagnino, Paolo
    ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT I, 2020, 11844 : 181 - 193
  • [44] IDENTIFICATION OF DEFECTIVE CHERRIES USING CONVOLUTIONAL NEURAL NETWORK
    Kaygisiz, Halil
    Cakir, Abdulkadir
    FRESENIUS ENVIRONMENTAL BULLETIN, 2022, 31 (06): : 5492 - 5498
  • [45] Identification of Cattle Breed using the Convolutional Neural Network
    Manoj, Sreenand
    Rakshith, S.
    Kanchana, V
    ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 503 - 507
  • [46] Cashmere and wool identification based on convolutional neural network
    Luo, Junli
    Lu, Kai
    Zhong, Yueqi
    Zhang, Boping
    Lv, Huizhu
    JOURNAL OF ENGINEERED FIBERS AND FABRICS, 2021, 16
  • [47] Nondestructive identification of barley seeds variety using near-infrared hyperspectral imaging coupled with convolutional neural network
    Singh, Tarandeep
    Garg, Neerja Mittal
    Iyengar, Sudarshan R. S.
    JOURNAL OF FOOD PROCESS ENGINEERING, 2021, 44 (10)
  • [48] Convolutional Neural Network for Human Activity Recognition and Identification
    Gamble, Justin A.
    Huang, Jingwei
    2020 14TH ANNUAL IEEE INTERNATIONAL SYSTEMS CONFERENCE (SYSCON2020), 2020,
  • [49] UAV detection and identification using a convolutional neural network
    Stancic, Ivo
    Juric, Toni
    2024 9TH INTERNATIONAL CONFERENCE ON SMART AND SUSTAINABLE TECHNOLOGIES, SPLITECH 2024, 2024,
  • [50] Application of a convolutional neural network for mooring failure identification
    Janas, K.
    Milne, I. A.
    Whelan, J. R.
    OCEAN ENGINEERING, 2021, 232