A multi-modal approach to detect inappropriate cartoon video contents using deep learning networks

被引:2
|
作者
Chuttur, M. Y. [1 ]
Nazurally, A. [1 ]
机构
[1] Univ Mauritius, Reduit 80837, Moka, Mauritius
关键词
Children; YouTube; Deep learning; User comments; Closed captions; Disturbing contents; SHORT-TERM; CLASSIFICATION; AGGRESSION; VIOLENCE; MEDIA;
D O I
10.1007/s11042-022-12709-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Children are more than ever exposed to all kinds of video contents on the Internet. Consequently, several studies have proposed different techniques to detect videos that can be harmful for children. However, we note so far, that no attention has been given to the cartoon characters and the underlying language used. To address this gap, we propose to evaluate the effectiveness of using actual images of cartoon characters and the language used in cartoons in categorising videos as being appropriate or inappropriate for children. We do so through the development of a multi modal classifier, which makes use of the output from two deep learning networks: LSTM for text analysis and VGGNet for image analysis. More specifically, the LSTM network is used to process user comments and closed captions associated with a video and the VGGNet network is used to recognize cartoon image characters. The LSTM model was trained and tested on a dataset comprising about 290,000 labelled text records, while the VGGNet model was trained and tested on a manually annotated image dataset of 6000 cartoon characters. A testing accuracy of 94% was obtained for the LSTM network while a testing accuracy of 99% was obtained for the VGGNet network. Our proposed approach was further evaluated using 50 actual videos intended for children from YouTube. Here also, a good accuracy of 72% was obtained using LSTM alone, while a better accuracy of 78% was obtained using VGGNet alone and an accuracy of 76% was obtained using the combined output from the LSTM and VGGNet networks. We conclude that closed captions, user comments and images of cartoon characters are all useful in detecting unsafe videos for children and can be considered as essential parameters to include when developing multimedia filtering tools.
引用
收藏
页码:16881 / 16900
页数:20
相关论文
共 50 条
  • [1] A multi-modal approach to detect inappropriate cartoon video contents using deep learning networks
    M. Y. Chuttur
    A. Nazurally
    [J]. Multimedia Tools and Applications, 2022, 81 : 16881 - 16900
  • [2] A Multi-Modal Deep Learning Approach for Emotion Recognition
    Shahzad, H. M.
    Bhatti, Sohail Masood
    Jaffar, Arfan
    Rashid, Muhammad
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02): : 1561 - 1570
  • [3] An Ensemble Learning Approach for Multi-Modal Medical Image Fusion using Deep Convolutional Neural Networks
    Maseleno, Andino
    Kavitha, D.
    Ashok, Koudegai
    Ansari, Mohammed Saleh Al
    Satheesh, Nimmati
    Reddy, R. Vijaya Kumar
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (08) : 758 - 769
  • [4] CovidSafe: A Deep Learning Framework for Covid Detection Using Multi-modal Approach
    Srikanth, Panigrahi
    Behera, Chandan Kumar
    Routhu, Srinivasa Rao
    [J]. New Generation Computing, 2025, 43 (01)
  • [5] An efficient deep learning-based video captioning framework using multi-modal features
    Varma, Soumya
    James, Dinesh Peter
    [J]. EXPERT SYSTEMS, 2021,
  • [6] Predicting Alzheimer’s disease progression using multi-modal deep learning approach
    Garam Lee
    Kwangsik Nho
    Byungkon Kang
    Kyung-Ah Sohn
    Dokyoon Kim
    [J]. Scientific Reports, 9
  • [7] Multi-Modal LA in Personalized Education Using Deep Reinforcement Learning Based Approach
    Sharif, Muddsair
    Uckelmann, Dieter
    [J]. IEEE ACCESS, 2024, 12 : 54049 - 54065
  • [8] Predicting Alzheimer's disease progression using multi-modal deep learning approach
    Lee, Garam
    Nho, Kwangsik
    Kang, Byungkon
    Sohn, Kyung-Ah
    Kim, Dokyoon
    Weiner, Michael W.
    Aisen, Paul
    Petersen, Ronald
    Jack, Clifford R., Jr.
    Jagust, William
    Trojanowki, John Q.
    Toga, Arthur W.
    Beckett, Laurel
    Green, Robert C.
    Saykin, Andrew J.
    Morris, John
    Shaw, Leslie M.
    Khachaturian, Zaven
    Sorensen, Greg
    Carrillo, Maria
    Kuller, Lew
    Raichle, Marc
    Paul, Steven
    Davies, Peter
    Fillit, Howard
    Hefti, Franz
    Holtzman, Davie
    Mesulam, M. Marcel
    Potter, William
    Snyder, Peter
    Montine, Tom
    Thomas, Ronald G.
    Donohue, Michael
    Walter, Sarah
    Sather, Tamie
    Jiminez, Gus
    Balasubramanian, Archana B.
    Mason, Jennifer
    Sim, Iris
    Harvey, Danielle
    Bernstein, Matthew
    Fox, Nick
    Thompson, Paul
    Schuff, Norbert
    DeCArli, Charles
    Borowski, Bret
    Gunter, Jeff
    Senjem, Matt
    Vemuri, Prashanthi
    Jones, David
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)
  • [9] A multi-modal machine learning approach to detect extreme rainfall events in Sicily
    Eleonora Vitanza
    Giovanna Maria Dimitri
    Chiara Mocenni
    [J]. Scientific Reports, 13
  • [10] A multi-modal machine learning approach to detect extreme rainfall events in Sicily
    Vitanza, Eleonora
    Dimitri, Giovanna Maria
    Mocenni, Chiara
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)