Describing an image with natural sentence without human involvement requires knowledge of both image processing and Natural Language Processing (NLP). Most of the existing works are based on unimodal representations of the visual and textual contents using an Encoder-Decoder (EnDec) Deep Neural Network (DNN), where the input images are encoded using Convolutional Neural Network (CNN) and the caption is generated by a Recurrent Neural Network (RNN). This paper dives into a basic image captioning model to quantify the impact of multimodal representation of the visual and textual cues. The multimodal representation is carried out via an early fusion of encoded visual cues from different CNNs, along with combined textual features from different word embedding techniques. The resultant of the multimodal representation of the visual and textual cues are employed to train a Long Short-Term Memory (LSTM)-based baseline caption generator to quantify the impact of various levels of complementary feature mutations. The ablation study involves two different CNN feature extractors and two types of textual feature extractors, shows that exploitation of the complementary information outperforms the unimodal representations significantly with endurable timing overhead.
机构:
School of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China
Insight Centre for Data Analytics, University of Galway, Galway, IrelandSchool of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China
Al-Qatf, Majjed
Hawbani, Ammar
论文数: 0引用数: 0
h-index: 0
机构:
School of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China
School of Computer Science, Shenyang Aerospace University, Shenyang, ChinaSchool of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China
Hawbani, Ammar
Wang, XingFu
论文数: 0引用数: 0
h-index: 0
机构:
School of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, ChinaSchool of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China
Wang, XingFu
Abdusallam, Amr
论文数: 0引用数: 0
h-index: 0
机构:
School of Electronic Engineering and Information Science, University of Science and Technology of China, Anhui, Hefei, ChinaSchool of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China
Abdusallam, Amr
Alsamhi, Saeed
论文数: 0引用数: 0
h-index: 0
机构:
Insight Centre for Data Analytics, University of Galway, Galway, Ireland
Faculty of Engineering, IBB University, IBB, YemenSchool of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China
Alsamhi, Saeed
Alhabib, Mohammed
论文数: 0引用数: 0
h-index: 0
机构:
School of Computer Science and Engineering, Centeral South University, Changsha, ChinaSchool of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China
Alhabib, Mohammed
Curry, Edward
论文数: 0引用数: 0
h-index: 0
机构:
Insight Centre for Data Analytics, University of Galway, Galway, IrelandSchool of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China
Curry, Edward
Journal of Intelligent and Fuzzy Systems,
2024,
46
(02):
: 3447
-
3459
机构:
Univ Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R ChinaUniv Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China
Zhou, Peilun
Xu, Tong
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R ChinaUniv Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China
Xu, Tong
Yin, Zhizhuo
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R ChinaUniv Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China
Yin, Zhizhuo
Liu, Dong
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R ChinaUniv Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China
Liu, Dong
Chen, Enhong
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R ChinaUniv Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China
Chen, Enhong
Lv, Guangyi
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R ChinaUniv Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China
Lv, Guangyi
Li, Changliang
论文数: 0引用数: 0
h-index: 0
机构:
Kingsoft AI Lab, Beijing 100085, Peoples R ChinaUniv Sci & Technol China, Sch Comp Sci, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China
机构:
Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Guangdong, Peoples R ChinaSun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Guangdong, Peoples R China
He, Chen
Hu, Haifeng
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Guangdong, Peoples R ChinaSun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Guangdong, Peoples R China