Using Bag-of-words to Distinguish Similar Languages: How Efficient are They?

被引:0
|
作者
Zampieri, Marcos [1 ]
机构
[1] Univ Saarland, D-66123 Saarbrucken, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a number of experiments describing the use of machine learning algorithms and bag-of-words to the task of automatic language identification. The paper focuses on the identification of language varieties, which is a known weakness of general purpose language identification methods. This question was addressed by a number of studies in the recent years, most of them relying on character n-gram language models. In this paper, I experiment simple bag-of-words and compare the results with previously proposed n-gram-based approaches. To perform these classification experiments three algorithms were used: Multinomial Naive Bayes (MNB), Support Vector Machines (SVM) and the J48 classifier.
引用
收藏
页码:37 / 41
页数:5
相关论文
共 50 条
  • [41] A Novel Feature Hashing With Efficient Collision Resolution for Bag-of-Words Representation of Text Data
    Eclarin, Bobby A.
    Fajardo, Arnel C.
    Medina, Ruji P.
    PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL (NLPIR 2018), 2018, : 12 - 16
  • [42] An efficient approach for scene categorization based on discriminative codebook learning in bag-of-words framework
    Li, Zhen
    Yap, Kim-Hui
    IMAGE AND VISION COMPUTING, 2013, 31 (10) : 748 - 755
  • [43] Do Important Words in Bag-of-Words Model of Text Relatedness Help?
    Islam, Aminul
    Milios, Evangelos
    Keselj, Vlado
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 569 - 577
  • [44] Supervised Learning and Codebook Optimization for Bag-of-Words Models
    Jiu, Mingyuan
    Wolf, Christian
    Garcia, Christophe
    Baskurt, Atilla
    COGNITIVE COMPUTATION, 2012, 4 (04) : 409 - 419
  • [45] Internet Traffic Classification based on bag-of-words model
    Zhang, Yin
    Zhou, Yi
    Chen, Kai
    2012 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2012, : 736 - 741
  • [46] MULTIMODAL BAG-OF-WORDS FOR CROSS DOMAINS SENTIMENT ANALYSIS
    Cummins, Nicholas
    Amiriparian, Shahin
    Ottl, Sandra
    Gerczuk, Maurice
    Schmitt, Maximilian
    Schuller, Bjoern
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4954 - 4958
  • [47] Release 'Bag-of-Words' Assumption of Latent Dirichlet Allocation
    Xuan, Junyu
    Lu, Jie
    Zhang, Guangquan
    Luo, Xiangfeng
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISKE 2013), 2014, 277 : 83 - 92
  • [48] Bag-of-Words Vector Quantization Based Face Identification
    Liu, Di
    Sun, Dong-mei
    Qiu, Zheng-ding
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, VOL II, 2009, : 29 - 33
  • [49] A Modified Bag-of-Words Representation for Industrial Alarm Floods
    Alinezhad, Haniyeh Seyed
    Shang, Jun
    Chen, Tongwen
    2022 IEEE INTERNATIONAL SYMPOSIUM ON ADVANCED CONTROL OF INDUSTRIAL PROCESSES (ADCONIP 2022), 2022, : 331 - 336
  • [50] Sequential Bag-of-Words model for human action classification
    Liu, Hong
    Tang, Hao
    Xiao, Wei
    Guo, ZiYi
    Tian, Lu
    Gao, Yuan
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2016, 1 (02) : 125 - 136