Deep learning for detecting inappropriate content in text

被引：40

作者：

Yenala, Harish ^{[1
]}

Jhanwar, Ashish ^{[1
]}

Chinnakotla, Manoj K. ^{[1
]}

Goyal, Jay ^{[1
]}

机构：

[1] Microsoft, Hyderabad, India

来源：

INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS | 2018年 / 6卷 / 04期

关键词：

Query classification; Deep learning; Query autosuggest; Web search; Conversations; CNN plus Bi-directional LSTM; Supervised learning;

D O I：

10.1007/s41060-017-0088-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Today, there are a large number of online discussion fora on the internet which are meant for users to express, discuss and exchange their views and opinions on various topics. For example, news portals, blogs, social media channels such as youtube. typically allow users to express their views through comments. In such fora, it has been often observed that user conversations sometimes quickly derail and become inappropriate such as hurling abuses, passing rude and discourteous comments on individuals or certain groups/communities. Similarly, some virtual agents or bots have also been found to respond back to users with inappropriate messages. As a result, inappropriate messages or comments are turning into an online menace slowly degrading the effectiveness of user experiences. Hence, automatic detection and filtering of such inappropriate language has become an important problem for improving the quality of conversations with users as well as virtual agents. In this paper, we propose a novel deep learning-based technique for automatically identifying such inappropriate language. We especially focus on solving this problem in two application scenarios-(a) Query completion suggestions in search engines and (b) Users conversations in messengers. Detecting inappropriate language is challenging due to various natural language phenomenon such as spelling mistakes and variations, polysemy, contextual ambiguity and semantic variations. For identifying inappropriate query suggestions, we propose a novel deep learning architecture called "Convolutional Bi-Directional LSTM (C-BiLSTM)" which combines the strengths of both Convolution Neural Networks (CNN) and Bi-directional LSTMs (BLSTM). For filtering inappropriate conversations, we use LSTM and Bi-directional LSTM (BLSTM) sequential models. The proposed models do not rely on hand-crafted features, are trained end-end as a single model, and effectively capture both local features as well as their global semantics. Evaluating C-BiLSTM, LSTM and BLSTM models on real-world search queries and conversations reveals that they significantly outperform both pattern-based and other hand-crafted feature-based baselines.

引用

页码：273 / 286

页数：14

共 50 条

[1] Utilizing Age-Adaptive Deep Learning Approaches for Detecting Inappropriate Video Content
Alam, Iftikhar
Basit, Abdul
Ziar, Riaz Ahmad
[J]. HUMAN BEHAVIOR AND EMERGING TECHNOLOGIES, 2024, 2024
[2] Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text
Gonzalo Molpeceres Barrientos
Rocío Alaiz-Rodríguez
Víctor González-Castro
Andrew C. Parnell
[J]. International Journal of Computational Intelligence Systems, 2020, 13 : 591 - 603
[3] Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text
Molpeceres Barrientos, Gonzalo
Alaiz-Rodriguez, Rocio
Gonzalez-Castro, Victor
Parnell, Andrew C.
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 591 - 603
[4] Deep Learning Algorithms for Detecting Fake News in Online Text
Girgis, Sherry
Amer, Eslam
Gadallah, Mahmoud
[J]. PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 93 - 97
[5] Automatically Detecting Image-Text Mismatch on Instagram with Deep Learning
Ha, Yui
Park, Kunwoo
Kim, Su Jung
Joo, Jungseock
Cha, Meeyoung
[J]. JOURNAL OF ADVERTISING, 2020, 50 (01) : 52 - 62
[6] Deep Learning-Based Detection of Inappropriate Speech Content for Film Censorship
Wazir, Abdulaziz Saleh Ba
Karim, Hezerul Abdul
Lyn, Hor Sui
Fauzi, Mohammad Faizal Ahmad
Mansor, Sarina
Lye, Mohd Haris
[J]. IEEE ACCESS, 2022, 10 : 101697 - 101715
[7] BERT-CNN: A Deep Learning Model for Detecting Emotions from Text
Abas, Ahmed R.
Elhenawy, Ibrahim
Zidan, Mahinda
Othman, Mahmoud
[J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (02): : 2943 - 2961
[8] Abstractive text summarization based on deep learning and semantic content generalization
Kouris, Panagiotis
Alexandridis, Georgios
Stafylopatis, Andreas
[J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5082 - 5092
[9] Deep Learning Driven Web Security: Detecting and Preventing Explicit Content
Shidaganti, Ganeshayya
Kumaran, Shubeeksh
Vishwachetan, D.
Shetty, Tejas B. N.
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 374 - 381
[10] AnimeNet: A Deep Learning Approach for Detecting Violence and Eroticism in Animated Content
Tang, Yixin
[J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (01): : 867 - 891

← 1 2 3 4 5 →