Abstractive Text-Image Summarization Using Multi-Modal Attentional Hierarchical RNN

被引:0
|
作者
Chen, Jingqiang [1 ]
Hai Zhuge [1 ,2 ,3 ,4 ]
机构
[1] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China
[2] Aston Univ, Birmingham, W Midlands, England
[3] Guangzhou Univ, Guangzhou, Peoples R China
[4] Chinese Acad Sci, Univ Chinese Acad Sci, Key Lab Intelligent Informat Proc, ICT, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Rapid growth of multi-modal documents on the Internet makes multi-modal summarization research necessary. Most previous research summarizes texts or images separately. Recent neural summarization research shows the strength of the Encoder-Decoder model in text summarization. This paper proposes an abstractive text-image summarization model using the attentional hierarchical Encoder-Decoder model to summarize a text document and its accompanying images simultaneously, and then to align the sentences and images in summaries. A multi-modal attentional mechanism is proposed to attend original sentences, images, and captions when decoding. The DailyMail dataset is extended by collecting images and captions from the Web. Experiments show our model outperforms the neural abstractive and extractive text summarization methods that do not consider images. In addition, our model can generate informative summaries of images.
引用
收藏
页码:4046 / 4056
页数:11
相关论文
共 50 条
  • [1] Extractive Text-Image Summarization Using Multi-Modal RNN
    Chen, Jingqiang
    Hai Zhuge
    [J]. 2018 14TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2018, : 245 - 248
  • [2] Extractive summarization of documents with images based on multi-modal RNN
    Chen, Jingqiang
    Hai Zhuge
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 99 : 186 - 196
  • [3] Diving into a Sea of Opinions: Multi-modal Abstractive Summarization with Comment Sensitivity
    Kumar, Raghvendra
    Chakraborty, Ratul
    Tiwari, Abhisek
    Saha, Sriparna
    Saini, Naveen
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1117 - 1126
  • [4] Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization
    Li, Manling
    Zhang, Lingyu
    Ji, Heng
    Radke, Richard J.
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2190 - 2196
  • [5] Multi-layered attentional peephole convolutional LSTM for abstractive text summarization
    Rahman, Md Motiur
    Siddiqui, Fazlul Hasan
    [J]. ETRI JOURNAL, 2021, 43 (02) : 288 - 298
  • [6] Multi-scale Multi-modal Dictionary BERT For Effective Text-image Retrieval in Multimedia Advertising
    Yu, Tan
    Liu, Jie
    Jin, Zhipeng
    Yang, Yi
    Fei, Hongliang
    Li, Ping
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4655 - 4660
  • [7] Efficient text-image semantic search: A multi-modal vision-language approach for fashion retrieval
    Moro, Gianluca
    Salvatori, Stefano
    Frisoni, Giacomo
    [J]. NEUROCOMPUTING, 2023, 538
  • [8] Golden Retriever: A Real-Time Multi-Modal Text-Image Retrieval System with the Ability to Focus
    Schneider, Florian
    Biemann, Chris
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3245 - 3250
  • [9] Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous Text, Image, Audio and Video
    Li, Haoran
    Zhu, Junnan
    Ma, Cong
    Zhang, Jiajun
    Zong, Chengqing
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (05) : 996 - 1009
  • [10] Abstractive Text Summarization Using Hybrid Technique of Summarization
    Liaqat, Muhammad Irfan
    Hamid, Isma
    Nawaz, Qamar
    Shafique, Nida
    [J]. 2022 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2022), 2022, : 141 - 144