An ensemble model of CNN with Bi-LSTM for automatic singer identification

被引：0

作者：

Mukkamala S. N. V. Jitendra

Y. Radhika

机构：

[1] GITAM School of Technology,Department of Computer Science and Engineering

[2] GITAM (Deemed-to-be University),undefined

来源：

Multimedia Tools and Applications | 2023年 / 82卷

关键词：

Bidirectional long short-term memory; CNN; Gender identification; LSTM-RNN; Music information retrieval; Singer identification; Spectrogram;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In the present-day scenario, gender detection has become significant in content-based multimedia systems. An automated mechanism for gender identification is mainly in demand to process the massive data. Singer identification is a popular topic in music information recommender systems that includes identifying the singer from the song based on the singer’s voice and other background key features like timbre and pitch. Many models like GMM, SVM, and MLP are broadly used for classification and singer identification. Moreover, most current models have limitations where vocals and instrumental music are separated manually, and only vocals are used to build and train the model. To deal with unstructured data like music, the deep learning techniques are very suitable and have exhibited exemplary performance in similar studies. In acoustic modeling, the Deep Neural Networks (DNN) models like convolutional neural networks (CNN) have played a promising role in classifying unstructured and poorly labeled data. In the current study, an ensemble model, a combination of a CNN model with bi-directional LSTM, is considered for singer identification from the spectrogram images generated from the audio clip. CNN models are proven to better handle variable-length input data by identifying the features. Bi-LSTM will yield better accuracy by remembering the essential features over time and addressing temporal contextual information. The experimentation is performed on the Indian songs and MIR-1 k data set, and it is observed that the proposed model has outperformed with a prediction accuracy of 97.4%. The performance of the proposed model is being compared against the existing models in the current study.

引用

页码：38853 / 38874

页数：21

共 50 条

[21] Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM
He, Shan
Lu, Yuanyao
Chen, Shengnan
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (07): : 941 - 947
[22] 基于CNN与Bi-LSTM的唇语识别研究
骆天依
刘大运
李修政
房国志
安欣
魏华杰
胡城
软件导刊, 2019, 18 (10) : 36 - 39
[23] Identifying Financial Text Causality with Bi-LSTM and Two-way CNN
Zhang, Shunxiang
Zhang, Zhenjiang
Zhu, Guangli
Zhao, Tong
Huang, Ju
Data Analysis and Knowledge Discovery, 2022, 6 (07) : 118 - 127
[24] 基于CNN和Bi-LSTM的脑电波情感分析
朱丽
杨青
吴涛
李晨
李铭
应用科学学报, 2022, 40 (01) : 1 - 12
[25] Mid-term electricity load prediction using CNN and Bi-LSTM
Gul, M. Junaid
Urfa, Gul Malik
Paul, Anand
Moon, Jihoon
Rho, Seungmin
Hwang, Eenjun
JOURNAL OF SUPERCOMPUTING, 2021, 77 (10): : 10942 - 10958
[26] Automatic hate speech detection using aspect based feature extraction and Bi-LSTM model
Kothuru, Srinivasulu
Santhanavijayan, A.
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (06) : 2934 - 2943
[27] 结合改进Bi-LSTM和CNN的文本情感分析
郭勇
赵康
潘力
信息技术, 2021, (02) : 50 - 55
[28] Hybrid Distance-based, CNN and Bi-LSTM System for Dictionary Expansion
Szakacs, Bela Benedek
Meszaros, Tamas
INFOCOMMUNICATIONS JOURNAL, 2020, 12 (04): : 6 - 13
[29] Automatic hate speech detection using aspect based feature extraction and Bi-LSTM model
Srinivasulu Kothuru
A. Santhanavijayan
International Journal of System Assurance Engineering and Management, 2022, 13 : 2934 - 2943
[30] CNN联合BI-LSTM混合模型的手势识别算法
纪盟盟
肖金壮
李瑞鹏
激光杂志, 2021, 42 (06) : 88 - 91

← 1 2 3 4 5 →