An effective cybernated word embedding system for analysis and language identification in code-mixed social media text

被引：4

作者：

Shekhar, Shashi ^{[1
]}

Sharma, Dilip Kumar ^{[1
]}

Beg, M. M. Sufyan ^{[2
]}

机构：

[1] GLA Univ, Dept Comp Engn & Applicat, Mathura 281406, India

[2] Aligarh Muslim Univ, Dept Comp Engn, Aligarh 202002, Uttar Pradesh, India

来源：

INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS | 2019年 / 23卷 / 03期

关键词：

Language identification; transliteration; character embedding; word embedding; Natural Language Processing; cBoW; skip-gram;

D O I：

10.3233/KES-190409

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The language used by the users in social media nowadays is Code-mixed text, i.e., mixing of two or more languages. This paper describes the application of the code mixed index in Indian social media texts and comparing the complexity to identify language at word level using Bi-directional Long Short Term Memory model. Social media platforms are now widely used by people to express their opinion and interest. The major contribution of the work is to propose a technique for identifying the language of Hindi-English code-mixed data used in three social media platforms namely, Facebook, Twitter, and WhatsApp. We recommend a deep learning framework based on cBoW and Skip gram model that predicts the origin of the word from language perspective in the sequence based on the specific words that have come before it in the sequence. The context capture module of the system gives better accuracy for word embedding model as compared to character embedding.

引用

页码：167 / 179

页数：13

共 50 条

[1] An Effective Bi-LSTM Word Embedding System for Analysis and Identification of Language in Code-Mixed Social Media Text in English and Roman Hindi
Shekhar, Shashi
Sharma, Dilip Kumar
Beg, M. M. Sufyan
COMPUTACION Y SISTEMAS, 2020, 24 (04): : 1415 - 1427
[2] Character Embedding for Language Identification in Hindi-English Code-mixed Social Media Text
Veena, P. V.
Kumar, M. Anand
Soman, K. P.
COMPUTACION Y SISTEMAS, 2018, 22 (01): : 65 - 74
[3] Word Level Language Identification system for Konkani-English Code-Mixed Social Media Text (CMST)
Phadte, Akshata
Wagh, Ramrao
COMPUTE'17: PROCEEDINGS OF THE 10TH ANNUAL ACM INDIA COMPUTE CONFERENCE, 2017, : 103 - 107
[4] A Language Identification System for Code-Mixed English-Manipuri Social Media Text
Lamabam, Priyadarshini
Chakma, Kunal
PROCEEDINGS OF 2ND IEEE INTERNATIONAL CONFERENCE ON ENGINEERING & TECHNOLOGY ICETECH-2016, 2016, : 79 - 83
[5] SwitchNet: Learning to switch for word-level language identification in code-mixed social media text
Sarma, Neelakshi
Sanasam Singh, Ranbir
Goswami, Diganta
NATURAL LANGUAGE ENGINEERING, 2022, 28 (03) : 337 - 359
[6] Automatic Language Identification system for code-mixed English-Kannada Social Media Text
Lakshmi, Sowmya B. S.
Shambhavi, B. R.
2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 214 - 218
[7] Language identification framework in code-mixed social media text based on quantum LSTM - the word belongs to which language?
Shekhar, Shashi
Sharma, Dilip Kumar
Beg, M. M. Sufyan
MODERN PHYSICS LETTERS B, 2020, 34 (06):
[8] Word Level Language Identification in Assamese-Bengali-Hindi-English Code-Mixed Social Media Text
Sarma, Neelakshi
Singh, Sanasam Ranbir
Goswami, Diganta
2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 261 - 266
[9] Text Normalization in Code-Mixed Social Media Text
Dutta, Sukanya
Saha, Tista
Banerjee, Somnath
Naskar, Sudip Kumar
2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 378 - 382
[10] Detecting Stance in Kannada Social Media Code-Mixed Text using Sentence Embedding
Skanda, V. Srinidhi
Kumar, M. Anand
Soman, K. P.
2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 964 - 969

← 1 2 3 4 5 →