Influence of social conversational features on language identification in highly multilingual online conversations

被引:7
|
作者
Sarma, Neelakshi [1 ]
Singh, Sanasam Ranbir [1 ]
Goswami, Diganta [1 ]
机构
[1] Indian Inst Technol, Comp Sci & Engn Dept, Gauhati, Assam, India
关键词
Language identification; Multilingual; Social conversational features; Convolutional neural network; TEXT;
D O I
10.1016/j.ipm.2018.09.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the explosion of multilingual content on Web, particularly in social media platforms, identification of languages present in the text is becoming an important task for various applications. While automatic language identification (ALI) in social media text is considered to be a non-trivial task due to the presence of slang words, misspellings, creative spellings and special elements such as hashtags, user mentions etc., ALI in multilingual environment becomes even more challenging task. In a highly multilingual society, code-mixing without affecting the underlying language sense has become a natural phenomenon. In such a dynamic environment, conversational text alone often fails to identify the underlying languages present in the text. This paper proposes various methods of exploiting social conversational features for enhancing ALI performance. Although social conversational features for ALI have been explored previously using methods like probabilistic language modeling, these models often fail to address issues related to code-mixing, phonetic typing, out-of-vocabulary etc. which are prevalent in a highly multilingual environment. This paper differs in the way the social conversational features are used to propose text refinement strategies that are suitable for ALI in highly multilingual environment. The contributions in this paper therefore includes the following. First, this paper analyzes the characteristics of various social conversational features by exploiting language usage patterns. Second, various methods of text refinement suitable for language identification are proposed. Third, the effects of the proposed refinement methods are investigated using various sentence level language identification frameworks. From various experimental observations over three conversational datasets collected from Facebook, Youtube and Twitter social media platforms, it is evident that our proposed method of ALI using social conversational features outperforms the baseline counterparts.
引用
收藏
页码:151 / 166
页数:16
相关论文
共 28 条
  • [1] An unsupervised multilingual approach for online social media topic identification
    Lo, Siaw Ling
    Chiong, Raymond
    Cornforth, David
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 81 : 282 - 298
  • [2] Language Identification from an Indian Multilingual Document Using Profile Features
    Padma, M. C.
    Vijaya, P. A.
    Nagabhushan, P.
    [J]. 2009 INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING, PROCEEDINGS, 2009, : 332 - +
  • [3] Understanding Online Communicative Language Features In Social Networking Environment
    Stapa, Siti Hamin
    Shaari, Azianura Hani
    [J]. GEMA ONLINE JOURNAL OF LANGUAGE STUDIES, 2012, 12 (03): : 817 - 830
  • [4] Abusive Language Detection in Online Conversations by Combining Content- and Graph-Based Features
    Cecillon, Noe
    Labatut, Vincent
    Dufour, Richard
    Linares, Georges
    [J]. FRONTIERS IN BIG DATA, 2019, 2
  • [5] A Survey on Influence Spreader Identification in Online Social Network
    Kumaran, P.
    Chitrakala, S.
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2016,
  • [6] Toward Accurate Social Media Language Identification: Combining Language Features with a Graphical Approach
    Abainia, Kheireddine
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON PATTERN ANALYSIS AND INTELLIGENT SYSTEMS (PAIS), 2018, : 22 - 28
  • [7] Identification of prominent Leaders and analysis of their influence in Online Social Networks
    Yeruva, Sujatha
    Devi, T.
    [J]. 2015 IEEE SEVENTH NATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND INFORMATION SYSTEMS (NCCCIS), 2015, : 34 - 39
  • [8] EFFECTIVENESS OF OPINION INFLUENCE APPROACHES IN HIGHLY CLUSTERED ONLINE SOCIAL NETWORKS
    Faletra, Melissa
    Palmer, Nathan
    Marshall, Jeffrey S.
    [J]. ADVANCES IN COMPLEX SYSTEMS, 2014, 17 (02):
  • [9] Predicting the Popularity of Online Content by Modeling the Social Influence and Homophily Features
    Shang, Yingdan
    Zhou, Bin
    Zeng, Xiang
    Wang, Ye
    Yu, Han
    Zhang, Zhong
    [J]. FRONTIERS IN PHYSICS, 2022, 10
  • [10] Credit distribution for influence maximization in online social networks with node features
    Deng, Xiaoheng
    Pan, Yan
    Shen, Hailan
    Gui, Jingsong
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2016, 31 (02) : 979 - 990