A novel framework for automatic caption and audio generation

被引:0
|
作者
Kulkarni, Chaitanya [1 ]
Monika, P. [2 ]
Preeti, B. [3 ]
Shruthi, S. [1 ]
机构
[1] Dayananda Sagar Coll Engn, Bangalore, India
[2] BMS Coll Engn, Bangalore, India
[3] KLE Technol Univ, Hubballi, India
关键词
Artificial Intelligence; Natural Language Processing; Computer Vision; Machine Learning; Transfer Learning; LSTM; Image Caption generation;
D O I
10.1016/j.matpr.2022.05.380
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In recent times, with the advancements of Artificial Intelligence, multidomain problems of Computer vision (CV) and Natural language processing (NLP) such as Image caption generation, audio generation has piqued the attention of researchers all across the globe due to its application in medical, business and technology. Image caption generating entails automatically generating text according to the contents available in the image. In this paper we present a novel caption generation and audio generation frame-work. We use Deep Neural Networks like Convolutional Neural Network (CNN), Long short-term memory (LSTM) and transfer learning techniques to perform this task. The model has two stages: 1] Generate cap-tions for any given image 2] Then gTTS (google text to speech) generator is used to generate audio for the generated captions. This framework is extremely beneficial to visually impaired people since it allows them to comprehend visuals. The Flickr8K dataset was used to train and test the model. A total of 6000 photos were utilised to train the model, with an additional 1000 images used for validation and testing.Copyright (c) 2022 Elsevier Ltd. All rights reserved. Selection and peer-review under responsibility of the scientific committee of the 2022 International Con-ference on Materials and Sustainable Manufacturing Technology.
引用
收藏
页码:3248 / 3252
页数:5
相关论文
共 50 条
  • [1] Fast Caption Alignment for Automatic Indexing of Audio
    Knight, Allan
    Almeroth, Kevin
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2010, 1 (02): : 1 - 17
  • [2] A novel automatic image caption generation using bidirectional long-short term memory framework
    Ye, Zhongfu
    Khan, Rashid
    Naqvi, Nuzhat
    Islam, M. Shujah
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (17) : 25557 - 25582
  • [3] A novel automatic image caption generation using bidirectional long-short term memory framework
    Zhongfu Ye
    Rashid Khan
    Nuzhat Naqvi
    M. Shujah Islam
    [J]. Multimedia Tools and Applications, 2021, 80 : 25557 - 25582
  • [4] Automatic Caption Generation for Medical Images
    Allaouzi, Imane
    Ben Ahmed, M.
    Benamrou, B.
    Ouardouz, M.
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS (SCA'18), 2018,
  • [5] A survey on automatic image caption generation
    Bai, Shuang
    An, Shan
    [J]. NEUROCOMPUTING, 2018, 311 : 291 - 304
  • [6] Automatic Caption Generation for News Images
    Feng, Yansong
    Lapata, Mirella
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (04) : 797 - 812
  • [7] A Multimodal Framework for Video Caption Generation
    Bhooshan, Reshmi S.
    Suresh, K.
    [J]. IEEE Access, 2022, 10 : 92166 - 92176
  • [8] A Multimodal Framework for Video Caption Generation
    Bhooshan, Reshmi S.
    Suresh, K.
    [J]. IEEE ACCESS, 2022, 10 : 92166 - 92176
  • [9] AutoCaption: Automatic Caption Generation for Personal Photos
    Ramnath, Krishnan
    Baker, Simon
    Vanderwende, Lucy
    El-Saban, Motaz
    Sinha, Sudipta N.
    Kannan, Anitha
    Hassan, Noran
    Galley, Michel
    Yang, Yi
    Ramanan, Deva
    Bergamo, Alessandro
    Torresani, Lorenzo
    [J]. 2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 1050 - 1057
  • [10] A Novel Automatic Content Generation and Optimization Framework
    Yu, Zixiao
    Wang, Haohong
    Katsaggelos, Aggelos K.
    Ren, Jian
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (14) : 12338 - 12351