An effective cybernated word embedding system for analysis and language identification in code-mixed social media text

被引：4

作者：

Shekhar, Shashi ^{[1
]}

Sharma, Dilip Kumar ^{[1
]}

Beg, M. M. Sufyan ^{[2
]}

机构：

[1] GLA Univ, Dept Comp Engn & Applicat, Mathura 281406, India

[2] Aligarh Muslim Univ, Dept Comp Engn, Aligarh 202002, Uttar Pradesh, India

来源：

INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS | 2019年 / 23卷 / 03期

关键词：

Language identification; transliteration; character embedding; word embedding; Natural Language Processing; cBoW; skip-gram;

D O I：

10.3233/KES-190409

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The language used by the users in social media nowadays is Code-mixed text, i.e., mixing of two or more languages. This paper describes the application of the code mixed index in Indian social media texts and comparing the complexity to identify language at word level using Bi-directional Long Short Term Memory model. Social media platforms are now widely used by people to express their opinion and interest. The major contribution of the work is to propose a technique for identifying the language of Hindi-English code-mixed data used in three social media platforms namely, Facebook, Twitter, and WhatsApp. We recommend a deep learning framework based on cBoW and Skip gram model that predicts the origin of the word from language perspective in the sequence based on the specific words that have come before it in the sequence. The context capture module of the system gives better accuracy for word embedding model as compared to character embedding.

引用

页码：167 / 179

页数：13

共 50 条

[21] Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text
Bansal, Neetika
Goyal, Vishal
Rani, Simpel
INTERNATIONAL JOURNAL OF E-ADOPTION, 2020, 12 (01) : 52 - 62
[22] DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text
Chakravarthi, Bharathi Raja
Priyadharshini, Ruba
Muralidaran, Vigneshwaran
Jose, Navya
Suryawanshi, Shardul
Sherly, Elizabeth
McCrae, John P.
LANGUAGE RESOURCES AND EVALUATION, 2022, 56 (03) : 765 - 806
[23] Transformer Based Language Identification for Malayalam-English Code-Mixed Text
Thara, S.
Poornachandran, Prabaharan
IEEE Access, 2021, 9 : 118837 - 118850
[24] Transformer Based Language Identification for Malayalam-English Code-Mixed Text
Thara, S.
Poornachandran, Prabaharan
IEEE ACCESS, 2021, 9 : 118837 - 118850
[25] Word Level Language Identification of Code Mixing Text in Social Media using NLP
Shanmugalingam, Kasthuri
Sumathipala, Sagara
Premachandra, Chinthaka
2018 3RD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY RESEARCH (ICITR), 2018,
[26] A Comparison Study of Word Embedding for Detecting Named Entities of Code-Mixed Data in Indian Language
Sravani, Lolla
Reddy, Atla Sowmya
Thara, S.
2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 2375 - 2381
[27] Language Identification and Analysis of Code-Switched Social Media Text
Mave, Deepthi
Maharjan, Suraj
Solorio, Thamar
COMPUTATIONAL APPROACHES TO LINGUISTIC CODE-SWITCHING, 2018, : 51 - 61
[28] Distributional Word Representations for Code-mixed Text in Moroccan Darija
Aghzal, Mohamed
Mourhir, Asmaa
AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 266 - 273
[29] CMHE-AN: Code mixed hybrid embedding based attention network for aggression identification in hindi english code-mixed text
Shikha Mundra
Namita Mittal
Multimedia Tools and Applications, 2023, 82 : 11337 - 11364
[30] CMHE-AN: Code mixed hybrid embedding based attention network for aggression identification in hindi english code-mixed text
Mundra, Shikha
Mittal, Namita
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (08) : 11337 - 11364

← 1 2 3 4 5 →