Image and Video Captioning with Augmented Neural Architectures

被引:17
|
作者
Shetty, Rakshith [1 ,2 ]
Tavakoli, Hamed R. [3 ,4 ]
Laaksonen, Jorma [3 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Saarland Univ, Saarbrucken, Germany
[3] Aalto Univ, Sch Sci, Dept Comp Sci, Espoo, Finland
[4] Tampere Univ Technol, Tampere, Finland
基金
芬兰科学院;
关键词
deep learning; image captioning; mulimodal learning; neural networks; pervasive computing; recurrent networks; ubiquitous computing; video captioning;
D O I
10.1109/MMUL.2018.112135923
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Neural-network-based image and video captioning can be substantially improved by utilizing architectures that make use of special features from the scene context, objects, and locations. A novel discriminatively trained evaluator network for choosing the best caption among those generated by an ensemble of caption generator networks further improves accuracy.
引用
收藏
页码:34 / 46
页数:13
相关论文
共 50 条
  • [1] Image Captioning using Deep Neural Architectures
    Shah, Parth
    Bakrola, Vishvajit
    Pati, Supriya
    [J]. 2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,
  • [2] Searching for memory-lighter architectures for OCR-augmented image captioning
    Gallardo-García, Rafael
    Beltrán-Martínez, Beatriz
    Hernández-Gracidas, Carlos
    Vilariño-Ayala, Darnes
    [J]. Journal of Intelligent and Fuzzy Systems, 2022, 42 (05): : 4399 - 4410
  • [3] Searching for memory-lighter architectures for OCR-augmented image captioning
    Gallardo-Garcia, Rafael
    Beltran-Martinez, Beatriz
    Hernandez-Gracidas, Carlos
    Vilarino-Aya, Darnes
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4399 - 4410
  • [4] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
    Sarto, Sara
    Barraco, Manuele
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6914 - 6924
  • [5] Image/video captioning
    画像/ビデオのキャプション
    [J]. Ushiku, Yoshitaka, 2018, Inst. of Image Information and Television Engineers (72):
  • [6] Image Captioning for Video Surveillance System using Neural Networks
    Nivedita, M.
    Chandrashekar, Priyanka
    Mahapatra, Shibani
    Phamila, Y. Asnath Victy
    Selvaperumal, Sathish Kumar
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2021, 21 (04)
  • [7] Efficient Video Captioning on Heterogeneous System Architectures
    Huang, Horng-Ruey
    Hong, Ding-Yong
    Wu, Jan-Jan
    Liu, Pangfeng
    Hsu, Wei-Chung
    [J]. 2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 1035 - 1045
  • [8] Retrieval-augmented Image Captioning
    Ramos, Rita
    Elliott, Desmond
    Martins, Bruno
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3666 - 3681
  • [9] Memory-Augmented Image Captioning
    Fei, Zhengcong
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1317 - 1324
  • [10] Accelerating Video Captioning on Heterogeneous System Architectures
    Huang, Horng-Ruey
    Hong, Ding-Yong
    Wu, Jan-Jan
    Chen, Kung-Fu
    Liu, Pangfeng
    Hsu, Wei-Chung
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (03)