μBoost: An Effective Method for Solving Indic Multilingual Text Classification Problem

被引:1
|
作者
Pathak, Manish [1 ]
Jain, Aditya [1 ]
机构
[1] MiQ Digital, Bengaluru, Karnataka, India
关键词
D O I
10.1109/BigMM55396.2022.00022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text Classification is an integral part of many Natural Language Processing tasks such as sarcasm detection, sentiment analysis and many more such applications. Many e-commerce websites, social-media/entertainment platforms use such models to enhance user-experience to generate traffic and thus revenue on their platforms. In this paper, we are presenting our solution to Multilingual Abusive Comment Identification Challenge on Moj, an Indian video-sharing social networking service, powered by ShareChat and IIIT-Delhi for the 2nd Workshop on Emerging Advances in Multimodal AI (EAM) at IEEE BigMM, 2021. The challenge dealt with detecting abusive comments, in 13 regional Indic languages, on the videos on Moj platform. Our solution utilizes the novel mu Boost, an ensemble of CatBoost classifier models and Multilingual Representations for Indian Languages (MURIL) model, to produce SOTA performance on Indic text classification tasks. We were able to achieve a mean F1-score of 89.286 on the test data, an improvement over baseline MURIL model with a F1-score of 87.48.
引用
收藏
页码:96 / 100
页数:5
相关论文
共 50 条
  • [1] Boosting Short Text Classification by Solving the OOV Problem
    Gao, Nan
    Wang, Yongjian
    Chen, Peng
    Tang, Jijun
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 4014 - 4024
  • [2] An Effective Method to Recognize the Language of a Text in a Collection of Multilingual Documents
    Kadri, Said
    Moussaoui, Abdelouahab
    [J]. 2013 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTER AND COMPUTATION (ICECCO), 2013, : 208 - 211
  • [3] Multilingual text classification using ontologies
    de Melo, Gerard
    Siersdorfer, Stefan
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 541 - +
  • [4] A METHOD OF SOLVING PROBLEM OF CLASSIFICATION OF OBJECTS OR PHENOMENA
    GORELIK, AL
    SKRIPKIN, BA
    [J]. ENGINEERING CYBERNETICS, 1965, (01): : 50 - &
  • [5] AN EFFECTIVE METHOD OF SOLVING A DIRICHLET PROBLEM FOR LAPLACE EQUATION
    VOLKOV, AP
    [J]. DIFFERENTIAL EQUATIONS, 1983, 19 (06) : 736 - 742
  • [7] An effective rough set-based method for text classification
    Bao, YG
    Asai, D
    Du, XY
    Yamada, K
    Ishii, N
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 545 - 552
  • [8] Multilingual Text Classification from Twitter during Emergencies
    Piscitelli, Sara
    Arnaudo, Edoardo
    Rossi, Claudio
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2021,
  • [9] The Effects of Underlying Mono and Multilingual Representations for Text Classification
    Ito, Fernando Tadao
    Caseli, Helena de Medeiros
    Moreira, Jander
    [J]. 2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 272 - 277
  • [10] Multilingual Question Classification based on surface text features
    Bisbal, E.
    Tomas, D.
    Moreno, L.
    Vicedo, J. L.
    Suarez, A.
    [J]. ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2005, 131 : 255 - 261