Multi-Modal Deep Analysis for Multimedia

被引:24
|
作者
Zhu, Wenwu [1 ]
Wang, Xin [1 ]
Li, Hongzhi [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Microsoft AI & Res, Redmond, WA 98052 USA
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Task analysis; Streaming media; Semantics; Videos; Visualization; Data integration; Cognition; Multi-modal analysis; data-driven correlational representation; knowledge-guided data fusion; HASH;
D O I
10.1109/TCSVT.2019.2940647
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the rapid development of Internet and multimedia services in the past decade, a huge amount of user-generated and service provider-generated multimedia data become available. These data are heterogeneous and multi-modal in nature, imposing great challenges for processing and analyzing them. Multi-modal data consist of a mixture of various types of data from different modalities such as texts, images, videos, audios etc. In this article, we present a deep and comprehensive overview for multi-modal analysis in multimedia. We introduce two scientific research problems, data-driven correlational representation and knowledge-guided fusion for multimedia analysis. To address the two scientific problems, we investigate them from the following aspects: 1) multi-modal correlational representation: multi-modal fusion of data across different modalities, and 2) multi-modal data and knowledge fusion: multi-modal fusion of data with domain knowledge. More specifically, on data-driven correlational representation, we highlight three important categories of methods, such as multi-modal deep representation, multi-modal transfer learning, and multi-modal hashing. On knowledge-guided fusion, we discuss the approaches for fusing knowledge with data and four exemplar applications that require various kinds of domain knowledge, including multi-modal visual question answering, multi-modal video summarization, multi-modal visual pattern mining and multi-modal recommendation. Finally, we bring forward our insights and future research directions.
引用
收藏
页码:3740 / 3764
页数:25
相关论文
共 50 条
  • [1] Multi-modal clustering for multimedia collections
    Bekkerman, Ron
    Jeon, Jiwoon
    [J]. 2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 1938 - +
  • [2] Special issue on multimedia recommendation and multi-modal data analysis
    Xiangnan He
    Zhenguang Liu
    Hanwang Zhang
    Chong-Wah Ngo
    Svebor Karaman
    Yongfeng Zhang
    [J]. Multimedia Systems, 2019, 25 : 591 - 592
  • [3] Special issue on multimedia recommendation and multi-modal data analysis
    He, Xiangnan
    Liu, Zhenguang
    Zhang, Hanwang
    Ngo, Chong-Wah
    Karaman, Svebor
    Zhang, Yongfeng
    [J]. MULTIMEDIA SYSTEMS, 2019, 25 (06) : 591 - 592
  • [4] A concept of multi-modal evaluation of multimedia quality
    Lovrencic, Tomaz
    Štular, Mitja
    Zgank, Andrej
    [J]. Elektrotehniski Vestnik/Electrotechnical Review, 2012, 79 (04): : 165 - 168
  • [5] A concept of multi-modal evaluation of multimedia quality
    Lovrencic, Tomaz
    Stular, Mitja
    Zgank, Andrej
    [J]. ELEKTROTEHNISKI VESTNIK-ELECTROCHEMICAL REVIEW, 2012, 79 (04): : 165 - 168
  • [6] Multi-modal Sentiment Analysis using Deep Canonical Correlation Analysis
    Sun, Zhongkai
    Sarma, Prathusha K.
    Sethares, William
    Bucy, Erik P.
    [J]. INTERSPEECH 2019, 2019, : 1323 - 1327
  • [7] Analysis of Deep Fusion Strategies for Multi-modal Gesture Recognition
    Roitberg, Alina
    Pollert, Tim
    Haurilet, Monica
    Martin, Manuel
    Stiefelhagen, Rainer
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 198 - 206
  • [8] Building Multi-Modal Relational Graphs for Multimedia Retrieval
    Shieh, Jyh-Ren
    Lin, Ching-Yung
    Wang, Shun-Xuan
    Wu, Ja-Ling
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2011, 2 (02): : 19 - 41
  • [9] Mobile language learning with multimedia and multi-modal interfaces
    Joseph, Sam
    Uther, Maria
    [J]. FOURTH IEEE INTERNATIONAL WORKSHOP ON WIRELESS, MOBILE AND UBIQUITOUS TECHNOLOGY IN EDUCATION, PROCEEDINGS, 2006, : 124 - +
  • [10] Flexible Multi-modal Hashing for Scalable Multimedia Retrieval
    Zhu, Lei
    Lu, Xu
    Cheng, Zhiyong
    Li, Jingjing
    Zhang, Huaxiang
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (02)