An effective cybernated word embedding system for analysis and language identification in code-mixed social media text

被引：4

作者：

Shekhar, Shashi ^{[1
]}

Sharma, Dilip Kumar ^{[1
]}

Beg, M. M. Sufyan ^{[2
]}

机构：

[1] GLA Univ, Dept Comp Engn & Applicat, Mathura 281406, India

[2] Aligarh Muslim Univ, Dept Comp Engn, Aligarh 202002, Uttar Pradesh, India

来源：

INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS | 2019年 / 23卷 / 03期

关键词：

Language identification; transliteration; character embedding; word embedding; Natural Language Processing; cBoW; skip-gram;

D O I：

10.3233/KES-190409

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The language used by the users in social media nowadays is Code-mixed text, i.e., mixing of two or more languages. This paper describes the application of the code mixed index in Indian social media texts and comparing the complexity to identify language at word level using Bi-directional Long Short Term Memory model. Social media platforms are now widely used by people to express their opinion and interest. The major contribution of the work is to propose a technique for identifying the language of Hindi-English code-mixed data used in three social media platforms namely, Facebook, Twitter, and WhatsApp. We recommend a deep learning framework based on cBoW and Skip gram model that predicts the origin of the word from language perspective in the sequence based on the specific words that have come before it in the sequence. The context capture module of the system gives better accuracy for word embedding model as compared to character embedding.

引用

页码：167 / 179

页数：13

共 50 条

[31] Sentiment Analysis of Code-Mixed Social Media Text (SA-CMSMT) in Indian-Languages
Ahmad, Gazi Imtiyaz
Singla, Jimmy
2021 INTERNATIONAL CONFERENCE ON COMPUTING SCIENCES (ICCS 2021), 2021, : 25 - 33
[32] Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
Santosh, T. Y. S. S.
Aravind, K. V. S.
PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD, 2019, : 310 - 313
[33] Social media text analytics of Malayalam–English code-mixed using deep learning
S. Thara
Prabaharan Poornachandran
Journal of Big Data, 9
[34] Named Entity Recognition for Hindi-English Code-Mixed Social Media Text
Singh, Vinay
Shrivastava, Manish
Akhtar, Syed Sarfaraz
Vijay, Deepanshu
NAMED ENTITIES, 2018, : 27 - 35
[35] A Comparative study on Code-Mixed data of Indian Social Media vs Formal text
Ranjan, Prakash
Raja, Bharathi
Priyadharshini, Ruba
Balabantaray, Rakesh Chandra
PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 608 - 611
[36] MHE: Code-Mixed Corpora for Similar Language Identification
Rani, Priya
McCrae, John P.
Fransen, Theodorus
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3425 - 3433
[37] Sentiment Analysis of Code-Mixed Bambara-French Social Media Text Using Deep Learning Techniques
Arouna KONATE
DU Ruiying
Wuhan University Journal of Natural Sciences, 2018, 23 (03) : 237 - 243
[38] Impact of Emojis in Emotion Analysis on Code-Mixed Text
Tang, Tianai
Nongpong, Kwankamol
PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, : 25 - 30
[39] Sentiment Analysis of Code-Mixed Text: A Comprehensive Review
Perera, Anne
Caldera, Amitha
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (02) : 242 - 261
[40] Social media text analytics of Malayalam-English code-mixed using deep learning
Thara, S.
Poornachandran, Prabaharan
JOURNAL OF BIG DATA, 2022, 9 (01)

← 1 2 3 4 5 →