COMFO: Multilingual Corpus for Opinion Mining

被引:0
|
作者
Faty, Lamine [1 ]
Drame, Khadim [1 ]
Sarr, Edouard Ngor [1 ]
Ndiaye, Marie [1 ]
Diop, Ibrahima [1 ]
Dia, Yoro [2 ]
Sall, Ousmane [3 ]
机构
[1] Univ Assane Seck Ziguinchor, Ziguinchor, Senegal
[2] Univ Iba Thiam, Ziguinchor, Senegal
[3] Univ Virtuelle Senegal, Ziguinchor, Senegal
来源
关键词
Opinion mining; Online comment; Corpus building; COMFO;
D O I
10.1007/978-3-031-19907-3_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of Machine Learning (ML) algorithms in opinion mining, particularly supervised learning algorithms, requires an annotated corpus to train the classification model in order to predict results that are close to reality. Unfortunately, there are still no resources for the automatic processing of textual data expressed in the Senegalese urban language. The objective of this paper is to build a multilingual corpus for opinion mining (COMFO). The process of building theCOMFOcorpus is composed of three steps: presentation of the data source, data collection and preparation, and annotation by lexical approach. The particularity of COMFO lies in the integration of foreign languages (French and English) and local languages, particularly urbanWolof, in order to reflect the collective opinion of Senegalese readers.
引用
下载
收藏
页码:14 / 19
页数:6
相关论文
共 50 条
  • [21] An Approach of XML-ifying the Crude Corpus in the Field of Opinion Mining
    Bhattacharyya, Debnath
    Mitra, Kheyali
    Choi, Minkyu
    Robles, Rosslin J.
    Ganguly, Debashis
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2009, 2 (03): : 13 - 22
  • [22] Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus
    O. I. Babina
    Automatic Documentation and Mathematical Linguistics, 2024, 58 : 63 - 79
  • [23] Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus
    Babina, O. I.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2024, 58 (01) : 63 - 79
  • [24] Multilingual opinion mining on YouTube - A convolutional N-gram BiLSTM word embedding
    Huy Tien Nguyen
    Minh Le Nguyen
    INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (03) : 451 - 462
  • [25] Multilingual Corpus Creation for Multilingual Semantic Similarity Task
    Ahmed, Mahtab
    Dixit, Chahna
    Mercer, Robert E.
    Khan, Atif
    Samee, Muhammad Rifayat
    Urra, Felipe
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4190 - 4196
  • [26] Corpus-Based Information Extraction and Opinion Mining for the Restaurant Recommendation System
    Pronoza, Ekaterina
    Yagunova, Elena
    Volskaya, Svetlana
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 272 - 284
  • [27] An Experimental Evaluation of Algorithms for Opinion Mining in Multi-domain Corpus in Albanian
    Kote, Nelda
    Biba, Marenglen
    Trandafili, Evis
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2018), 2018, 11177 : 439 - 447
  • [28] Multilingual Image Corpus - Towards a Multimodal and Multilingual Dataset
    Koeva, Svetla
    Stoyanova, Ivelina
    Kralev, Jordan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1509 - 1518
  • [29] MLSUM: The Multilingual Summarization Corpus
    Scialom, Thomas
    Dray, Paul-Alexis
    Lamprier, Sylvain
    Piwowarski, Benjamin
    Staiano, Jacopo
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8051 - 8067
  • [30] The Multilingual Amazon Reviews Corpus
    Keung, Phillip
    Lu, Yichao
    Szarvas, Gyorgy
    Smith, Noah A.
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4563 - 4568