Movie tag prediction: An extreme multi-label multi-modal transformer-based solution with explanation

被引:2
|
作者
Guarascio, Massimo [1 ]
Minici, Marco [1 ,2 ]
Pisani, Francesco Sergio [1 ]
De Francesco, Erika [3 ]
Lambardi, Pasquale [3 ]
机构
[1] ICAR CNR, Via P Bucci 8-9c, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Pisa, Largo B Pontecorvo 3, I-56127 Pisa, Italy
[3] Relatech Spa, Via Anguissola 23, I-20146 Milan, Italy
关键词
Multi-modal learning; Extreme multi-label classification; Deep learning; Explainable artificial intelligence; Movie genre classification;
D O I
10.1007/s10844-023-00836-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Providing rich and accurate metadata for indexing media content is a crucial problem for all the companies offering streaming entertainment services. These metadata are commonly employed to enhance search engine results and feed recommendation algorithms to improve the matching with user interests. However, the problem of labeling multimedia content with informative tags is challenging as the labeling procedure, manually performed by domain experts, is time-consuming and prone to error. Recently, the adoption of AI-based methods has been demonstrated to be an effective approach for automating this complex process. However, developing an effective solution requires coping with different challenging issues, such as data noise and the scarcity of labeled examples during the training phase. In this work, we address these challenges by introducing a Transformer-based framework for multi-modal multi-label classification enriched with model prediction explanation capabilities. These explanations can help the domain expert to understand the system's predictions. Experimentation conducted on two real test cases demonstrates its effectiveness.
引用
收藏
页码:1021 / 1043
页数:23
相关论文
共 50 条
  • [1] Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection
    Ju, Xincheng
    Zhang, Dong
    Li, Junhui
    Zhou, Guodong
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 512 - 520
  • [2] TRANSFORMER-BASED MULTI-MODAL LEARNING FOR MULTI-LABEL REMOTE SENSING IMAGE CLASSIFICATION
    Hoffmann, David Sebastian
    Clasen, Kai Norman
    Demir, Begum
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4891 - 4894
  • [3] Collaboration based multi-modal multi-label learning
    Zhang, Yi
    Zhu, Yinlong
    Zhang, Zhecheng
    Wang, Chongjung
    [J]. APPLIED INTELLIGENCE, 2022, 52 (12) : 14204 - 14217
  • [4] Collaboration based multi-modal multi-label learning
    Yi Zhang
    Yinlong Zhu
    Zhecheng Zhang
    Chongjung Wang
    [J]. Applied Intelligence, 2022, 52 : 14204 - 14217
  • [5] Multi-Modal Pedestrian Crossing Intention Prediction with Transformer-Based Model
    Wang, Ting-Wei
    Lai, Shang-Hong
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2024, 13 (05)
  • [6] Pedestrian Crossing Intention Prediction with Multi-Modal Transformer-Based Model
    Wang, Ting Wei
    Lai, Shang-Hong
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1349 - 1356
  • [7] M3TR: Multi-modal Multi-label Recognition with Transformer
    Zhao, Jiawei
    Zhao, Yifan
    Li, Jia
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 469 - 477
  • [8] MHM: Multi-modal Clinical Data based Hierarchical Multi-label Diagnosis Prediction
    Qiao, Zhi
    Zhang, Zhen
    Wu, Xian
    Ge, Shen
    Fan, Wei
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1841 - 1844
  • [9] Multi-modal Motion Prediction with Transformer-based Neural Network for Autonomous Driving
    Huang, Zhiyu
    Mo, Xiaoyu
    Lv, Chen
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 2605 - 2611
  • [10] Multi-modal Multi-label Emotion Detection with Modality and Label Dependence
    Dong Zhang
    Ju, Xincheng
    Li, Junhui
    Li, Shoushan
    Zhu, Qiaoming
    Zhou, Guodong
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3584 - 3593