Hierarchical Memory Decoder for Visual Narrating

被引：10

作者：

Wu, Aming ^{[1
,2
]}

Han, Yahong ^{[1
,2
,3
]}

Zhao, Zhou ^{[4
]}

Yang, Yi ^{[5
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300350, Peoples R China

[2] Tianjin Univ, Tianjin Key Lab Machine Learning, Tianjin 300350, Peoples R China

[3] Peng Chong Lab, Shenzhen 518066, Peoples R China

[4] Zhejiang Univ, Coll Comp Sci, Hangzhou 310007, Peoples R China

[5] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW 2007, Australia

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2021年 / 31卷 / 06期

关键词：

Decoding; Visualization; Videos; Task analysis; Computer architecture; Electronic mail; Semantics; Visual narrating; multi-modal fusion; hierarchical memory decoder; video captioning; visual storytelling; STREAM;

D O I：

10.1109/TCSVT.2020.3020877

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Visual narrating focuses on generating semantic descriptions to summarize visual content of images or videos, e.g., visual captioning and visual storytelling. The challenge mainly lies in how to design a decoder to generate accurate descriptions matching visual content. Recent advances often employ a recurrent neural network (RNN), e.g., Long-Short Term Memory (LSTM), as the decoder. However, RNN is prone to diluting long-term information, which weakens its performance of capturing long-term dependencies. Recent work has demonstrated memory network (MemNet) owns the advantage of storing long-term information. However, as the decoder, it has not been well exploited for visual narrating. The reason partially comes from the difficulty of multi-modal sequential decoding with MemNet. In this article, we devise a novel memory decoder for visual narrating. Concretely, to obtain a better multi-modal representation, we first design a new multi-modal fusion method to fully merge visual and lexical information. Then, based on the fusion result, during decoding, we construct a MemNet-based decoder consisting of multiple memory layers. Particularly, in each layer, we employ a memory set to store previous decoding information and utilize an attention mechanism to adaptively select the information related to the current output. Meanwhile, we also employ a memory set to store the decoding output of each memory layer at the current time step and still utilize an attention mechanism to select the related information. Thus, this decoder alleviates dilution of long-term information. Meanwhile, the hierarchical architecture leverages the latent information of each layer, which is helpful for generating accurate descriptions. Experimental results on two tasks of visual narrating, i.e., video captioning and visual storytelling, show that our decoder could obtain superior results and outperform the performance of conventional RNN-based decoder.

引用

页码：2438 / 2449

页数：12

共 50 条

[41] Design of Parallel BCH Decoder for MLC Memory
Jang, Song-Chul
Lee, Je-Hoon
Lee, Won-Chul
Cho, Kyoung-Rok
ISOCC: 2008 INTERNATIONAL SOC DESIGN CONFERENCE, VOLS 1-3, 2008, : 687 - 688
[42] A memory efficient serial LDPC decoder architecture
Prabhakar, A
Narayanan, K
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 41 - 44
[43] Fault secure encoder and decoder for memory applications
Naeimi, Helia
Dehon, Andre
DFT 2007: 22ND IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT-TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2007, : 409 - 417
[44] HIERARCHICAL MODEL OF MEMORY AND MEMORY LOSS
SUTTON, JP
BEIS, JS
TRAINOR, LEH
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1988, 21 (23): : 4443 - 4454
[45] The Palestine Nakba: Decolonizing History, Narrating the Subaltern, Reclaiming Memory
Boullata, Issa J.
MIDDLE EAST JOURNAL, 2012, 66 (04): : 747 - 749
[46] Narrating atrocity: Genocide memorials, dark tourism, and the politics of memory
Lischer, Sarah Kenyon
REVIEW OF INTERNATIONAL STUDIES, 2019, 45 (05) : 805 - 827
[47] Parallel low memory size turbo decoder
Niu, K
Wu, WL
2003 INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOL 1 AND 2, PROCEEDINGS, 2003, : 874 - 877
[48] "Mixing memory": discovering and narrating the other selves of Alzheimer's
Parlati, Marilena
PROSE STUDIES-HISTORY THEORY CRITICISM, 2021, 42 (01): : 53 - 67
[49] FPGA Implementation of BCH Decoder for Memory Systems
Chandrashekhara, B. S.
Sudha, K. L.
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2015, : 542 - 547
[50] A memory-efficient progressive JPEG decoder
Lee, Kun-Bin
Ju, Chi-Cheng
2007 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), PROCEEDINGS OF TECHNICAL PAPERS, 2007, : 8 - +

← 1 2 3 4 5 →