A convolutional neural network approach for gender and language variety identification

被引:7
|
作者
Gomez-Adorno, Helena [1 ]
Fuentes-Alba, Roddy [2 ]
Markov, Ilia [3 ]
Sidorov, Grigori [2 ]
Gelbukh, Alexander [2 ]
机构
[1] Univ Nacl Autonoma Mexico, Inst Invest Matemdt Aplicadas & Sistemas IIMAS, Mexico City, DF, Mexico
[2] Inst Politecn Nacl, CIC, Mexico City, DF, Mexico
[3] INRIA, Le Chesnay, France
关键词
Convolutional neural networks; deep learning; author profiling; gender identification; language variety identification; machine learning; character n-grams; Spanish;
D O I
10.3233/JIFS-179032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for gender and language variety identification using a convolutional neural network (CNN). We compare the performance of this method with a traditional machine learning algorithm - support vector machines (SVM) trained on character n-grams (n = 3-8) and lexical features (unigrams and bigrams of words), and their combinations. We use a single multi-labeled corpus composed of news articles in different varieties of Spanish developed specifically for these tasks. We present a convolutional neural network trained on word- and sentence-level embeddings architecture that can be successfully applied to gender and language variety identification on a relatively small corpus (less than 10,000 documents). Our experiments show that the deep learning approach outperforms a traditional machine learning approach on both tasks, when named entities are present in the corpus. However, when evaluating the performance of these approaches reducing all named entities to a single symbol "NE" to avoid topic-dependent features, the drop in accuracy is higher for the deep learning approach.
引用
收藏
页码:4845 / 4855
页数:11
相关论文
共 50 条
  • [1] Gender classification: a convolutional neural network approach
    Liew, Shan Sung
    Khalil-Hani, Mohamed
    Ahmad Radzi, Syafeeza
    Bakhteri, Rabia
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2016, 24 (03) : 1248 - 1264
  • [2] Language Identification using Stacked Convolutional Neural Network (SCNN)
    Bohra, Navdeep
    Bhatnagar, Vishal
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 20 - 25
  • [3] CASI: A Convolutional Neural Network Approach for Shell Identification
    Van Oort, Colin M.
    Xu, Duo
    Offner, Stella S. R.
    Gutermuth, Robert A.
    ASTROPHYSICAL JOURNAL, 2019, 880 (02):
  • [4] Shearlet Convolutional Neural Network Approach for Age and Gender recognition
    Ziani, Chaymae
    Sadiq, Abdelalim
    2019 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS 2019), 2019,
  • [5] Skull Gender Identification Based on Skull Contour and Convolutional Neural Network
    Liu, XiaoNing
    Wang, ShiXiong
    Zhao, ShangHao
    Qiao, FangFang
    Feng, Jun
    Yang, Wen
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2269 - 2273
  • [6] Spoken Language Identification Using Convolutional Neural Network In Nepalese Context
    Sapkota, Shiva Sagar
    Shakya, Aman
    Joshi, Basanta
    Proceedings of 2023 26th Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2023, 2023,
  • [7] Spoken Language Identification with Deep Convolutional Neural Network and Data Augmentation
    Korkut, Can
    Haznedaroglu, Ali
    Arslan, Levent M.
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [8] Spoken Language Identification System Using Convolutional Recurrent Neural Network
    Alashban, Adal A.
    Qamhan, Mustafa A.
    Meftah, Ali H.
    Alotaibi, Yousef A.
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [9] A MAC Protocol Identification Approach Based on Convolutional Neural Network
    Zhang, Xutong
    Shen, Weiguo
    Xu, Jianliang
    Liu, Zitong
    Ding, Guoru
    2020 12TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2020, : 534 - 539
  • [10] Convolutional neural network approach for the automated identification of in cellulo crystals
    Kardoost, Amirhossein
    Schonherr, Robert
    Deiter, Carsten
    Redecke, Lars
    Lorenzen, Kristina
    Schulz, Joachim
    de Diego, Inaki
    JOURNAL OF APPLIED CRYSTALLOGRAPHY, 2024, 57 : 266 - 275