Deep learning for detecting inappropriate content in text

被引:40
|
作者
Yenala, Harish [1 ]
Jhanwar, Ashish [1 ]
Chinnakotla, Manoj K. [1 ]
Goyal, Jay [1 ]
机构
[1] Microsoft, Hyderabad, India
关键词
Query classification; Deep learning; Query autosuggest; Web search; Conversations; CNN plus Bi-directional LSTM; Supervised learning;
D O I
10.1007/s41060-017-0088-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Today, there are a large number of online discussion fora on the internet which are meant for users to express, discuss and exchange their views and opinions on various topics. For example, news portals, blogs, social media channels such as youtube. typically allow users to express their views through comments. In such fora, it has been often observed that user conversations sometimes quickly derail and become inappropriate such as hurling abuses, passing rude and discourteous comments on individuals or certain groups/communities. Similarly, some virtual agents or bots have also been found to respond back to users with inappropriate messages. As a result, inappropriate messages or comments are turning into an online menace slowly degrading the effectiveness of user experiences. Hence, automatic detection and filtering of such inappropriate language has become an important problem for improving the quality of conversations with users as well as virtual agents. In this paper, we propose a novel deep learning-based technique for automatically identifying such inappropriate language. We especially focus on solving this problem in two application scenarios-(a) Query completion suggestions in search engines and (b) Users conversations in messengers. Detecting inappropriate language is challenging due to various natural language phenomenon such as spelling mistakes and variations, polysemy, contextual ambiguity and semantic variations. For identifying inappropriate query suggestions, we propose a novel deep learning architecture called "Convolutional Bi-Directional LSTM (C-BiLSTM)" which combines the strengths of both Convolution Neural Networks (CNN) and Bi-directional LSTMs (BLSTM). For filtering inappropriate conversations, we use LSTM and Bi-directional LSTM (BLSTM) sequential models. The proposed models do not rely on hand-crafted features, are trained end-end as a single model, and effectively capture both local features as well as their global semantics. Evaluating C-BiLSTM, LSTM and BLSTM models on real-world search queries and conversations reveals that they significantly outperform both pattern-based and other hand-crafted feature-based baselines.
引用
收藏
页码:273 / 286
页数:14
相关论文
共 50 条
  • [1] Utilizing Age-Adaptive Deep Learning Approaches for Detecting Inappropriate Video Content
    Alam, Iftikhar
    Basit, Abdul
    Ziar, Riaz Ahmad
    [J]. HUMAN BEHAVIOR AND EMERGING TECHNOLOGIES, 2024, 2024
  • [2] Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text
    Gonzalo Molpeceres Barrientos
    Rocío Alaiz-Rodríguez
    Víctor González-Castro
    Andrew C. Parnell
    [J]. International Journal of Computational Intelligence Systems, 2020, 13 : 591 - 603
  • [3] Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text
    Molpeceres Barrientos, Gonzalo
    Alaiz-Rodriguez, Rocio
    Gonzalez-Castro, Victor
    Parnell, Andrew C.
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 591 - 603
  • [4] Deep Learning Algorithms for Detecting Fake News in Online Text
    Girgis, Sherry
    Amer, Eslam
    Gadallah, Mahmoud
    [J]. PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 93 - 97
  • [5] Automatically Detecting Image-Text Mismatch on Instagram with Deep Learning
    Ha, Yui
    Park, Kunwoo
    Kim, Su Jung
    Joo, Jungseock
    Cha, Meeyoung
    [J]. JOURNAL OF ADVERTISING, 2020, 50 (01) : 52 - 62
  • [6] Deep Learning-Based Detection of Inappropriate Speech Content for Film Censorship
    Wazir, Abdulaziz Saleh Ba
    Karim, Hezerul Abdul
    Lyn, Hor Sui
    Fauzi, Mohammad Faizal Ahmad
    Mansor, Sarina
    Lye, Mohd Haris
    [J]. IEEE ACCESS, 2022, 10 : 101697 - 101715
  • [7] BERT-CNN: A Deep Learning Model for Detecting Emotions from Text
    Abas, Ahmed R.
    Elhenawy, Ibrahim
    Zidan, Mahinda
    Othman, Mahmoud
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (02): : 2943 - 2961
  • [8] Abstractive text summarization based on deep learning and semantic content generalization
    Kouris, Panagiotis
    Alexandridis, Georgios
    Stafylopatis, Andreas
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5082 - 5092
  • [9] Deep Learning Driven Web Security: Detecting and Preventing Explicit Content
    Shidaganti, Ganeshayya
    Kumaran, Shubeeksh
    Vishwachetan, D.
    Shetty, Tejas B. N.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 374 - 381
  • [10] AnimeNet: A Deep Learning Approach for Detecting Violence and Eroticism in Animated Content
    Tang, Yixin
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (01): : 867 - 891