Movie genre classification using binary relevance, label powerset, and machine learning classifiers

被引:4
|
作者
Kumar, Sanjay [1 ]
Kumar, Nikhil [1 ]
Dev, Aditya [1 ]
Naorem, Siraz [1 ]
机构
[1] Delhi Technol Univ, Dept Comp Sci & Engn, New Delhi 110042, India
关键词
Binary relevance; Label powerset; Machine learning classifiers; Movie genre classification; Multi-label text classification; Support vector classifier;
D O I
10.1007/s11042-022-13211-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-label text classification (MLTC) is a technique to categorize texts into more than a single category and used extensively in various real-life problems. Such classifications problems are challenging and dependent on many factors and changes according to the problem. Movie genre classification is a popular multi-label text classification problem as movies may belong to multiple genres at the same time. The major factors used for movie genre classification are based on parameters like movie plot, title, summary, and subtitles. In recent years, some neural networks based approaches are proposed for solving such problems, which turns the solution into resource intensive and time consuming activities. In this paper, we propose a novel method of movie genre classification using a combination of problem transformation techniques, namely binary relevance (BR) and label powerset (LP), text vectorizers and machine learning classifier models. We perform binary relevance task (BR) that converts multi-label classification tasks into independent binary classification tasks whereas label powerset transforms a multi-label problem into a multiclass problem with one multiclass classifier trained on all unique label combinations found in the training data. Further, we apply text vectorizers namely, CV (Count Vectorizer) and TF-IDF (Term Frequency - Inverse Document Frequency) to tokenize the textual data to build a word vocabulary followed by employing various classifiers i.e., Logistic Regression (LR), Multinomial Naive Bayes (MNB), K-Nearest Neighbor (KNN), Support Vector Classifier (SVC) with the combination of different vectorizers and problem transformation methods. To test the effectiveness of these combinations, we use the k-fold cross-validation technique. We construct different combination using problem transformation approaches, text vectorizers and classifier models leading to overall 16 different combinations for classifying movies into appropriate genres. Finally, we evaluate the performance of each combination on publicly available IMDb datasets with target on 27 major parent genres using different performance measures and reveal that the best result is obtained using the combination comprising of label powerset (LP) as Problem transformation approach, TF-IDF as the text vectorizer and support vector classifier (SVC) as the machine learning classifier model with a commendable accuracy of 0.95 and F1-score of 0.86.
引用
收藏
页码:945 / 968
页数:24
相关论文
共 50 条
  • [41] Music Genre Classification using Dynamic Selection of Ensemble of Classifiers
    Lisboa de Almeida, Paulo Ricardo
    Britto, Alceu de Souza, Jr.
    da Silva Junior, Eunelson Jose
    Soares de Oliveira, Luis Eduardo
    Celinski, Tatiana Montes
    Koerich, Alessandro Lameiras
    [J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2700 - 2705
  • [42] Incorporating label dependency into the binary relevance framework for multi-label classification
    Alvares-Cherman, Everton
    Metz, Jean
    Monard, Maria Carolina
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (02) : 1647 - 1655
  • [43] Using Credal-C4.5 with Binary Relevance for Multi-Label Classification
    Moral-Garcia, Serafin
    Mantas, Carlos J.
    Castellano, Javier G.
    Abellan, Joaquin
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (06) : 6501 - 6512
  • [44] Binary Malware image Classification using Machine Learning with Local Binary Pattern
    Luo, Jhu-Sin
    Lo, Dan Chia-Tien
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4664 - 4667
  • [45] Binary relevance for multi-label learning: an overview
    Zhang, Min-Ling
    Li, Yu-Kun
    Liu, Xu-Ying
    Geng, Xin
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (02) : 191 - 202
  • [46] Binary relevance for multi-label learning: an overview
    Min-Ling Zhang
    Yu-Kun Li
    Xu-Ying Liu
    Xin Geng
    [J]. Frontiers of Computer Science, 2018, 12 : 191 - 202
  • [47] Image Classification Using No-balance Binary Tree Relevance Vector Machine
    Wang, Ke
    Jia, Haitao
    [J]. 2009 INTERNATIONAL ASIA SYMPOSIUM ON INTELLIGENT INTERACTION AND AFFECTIVE COMPUTING, 2009, : 79 - 82
  • [48] Return Instruction Classification in Binary Code Using Machine Learning
    Qiu, Jing
    Geng, Xiaoxu
    Dong, Feng
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2022, 32 (09) : 1419 - 1452
  • [49] Dependent binary relevance models for multi-label classification
    Montanes, Elena
    Senge, Robin
    Barranquero, Jose
    Ramon Quevedo, Jose
    Jose del Coz, Juan
    Huellermeier, Eyke
    [J]. PATTERN RECOGNITION, 2014, 47 (03) : 1494 - 1508
  • [50] Classification of Testable and Valuable User Stories by using Supervised Machine Learning Classifiers
    Subedi, Ishan Mani
    Singh, Maninder
    Ramasamy, Vijayalakshmi
    Walia, Gursimran Singh
    [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2021), 2021, : 409 - 414