Accent Recognition Using a Spectrogram Image Feature-Based Convolutional Neural Network

被引:6
|
作者
Cetin, Onursal [1 ]
机构
[1] Bandirma Onyedi Eylul Univ, Elect & Elect Engn Dept, TR-10200 Balikesir, Turkey
关键词
Regional accent recognition; Spectrogram; Convolutional neural network; Transfer learning; I-vector; SOUND EVENT CLASSIFICATION; FREQUENCY-CHARACTERISTICS; CNN;
D O I
10.1007/s13369-022-07086-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accent recognition is a significant area of research, whose importance has increased in recent years. Numerous studies have been carried out using various languages to improve the performance of accent recognition systems. However, the recognition of a language's regional accents is still a challenging problem. In this study, regional accents of British English were recognized for both gender-independent and gender-dependent experiments using a convolutional neural network. Many different acoustic features were used in the studies. While there is still no generally accepted feature set, the selection of handcrafted features is a challenging task. Moreover, converting audio signals into images in the most appropriate way is critical for a convolutional neural network, a deep learning model commonly used in image applications. To take advantage of the convolutional neural networks' ability to characterize two-dimensional signals, spectrogram image features that visualize the speech signal frequency distribution were used. For this purpose, sound signals were first segmented to their state before normalization. Each segment was combined by taking the fast Fourier transform. The absolute value was taken, and then, the log function was used to compress the dynamic range of these linear rate maps, resulting in log-power rate maps. After a grayscale image was formed by normalizing the obtained time-frequency matrix in the range of [0, 1], the dynamic range was quantified to red, green, and blue color values to generate a monochrome image. Thus, the feature extraction process, which is time-consuming and challenging, was simplified using spectrogram images and a convolutional neural network. In addition, although it is desired that the training and test data have a uniform distribution, the heterogeneity of the data adversely affects the performance of machine learning algorithms. To overcome this problem and improve the model's performance, transfer learning, a state-of-the-art technology that enables data transfer from the pre-trained AlexNet model with 1.3 million pictures on the ImageNet database, was utilized. Several performance metrics, such as accuracy, specificity, sensitivity, precision, and F-score, were used to evaluate the proposed approach. The accuracy of 92.92 and 93.38% and the F-score of 92.67 and 93.19% were obtained for gender-independent and gender-dependent experiments, respectively. Additionally, i-vector-based linear discriminant analysis and support vector machine methods were used in the study. Thus, the results obtained to evaluate the performance of the proposed recognition method are presented comparatively.
引用
收藏
页码:1973 / 1990
页数:18
相关论文
共 50 条
  • [1] Accent Recognition Using a Spectrogram Image Feature-Based Convolutional Neural Network
    Onursal Cetin
    Arabian Journal for Science and Engineering, 2023, 48 : 1973 - 1990
  • [2] Spatial feature-based convolutional neural network for PolSAR image classification
    Shang, Ronghua
    Wang, Jiaming
    Jiao, Licheng
    Yang, Xiaohui
    Li, Yangyang
    APPLIED SOFT COMPUTING, 2022, 123
  • [3] Spectrogram-Based Automatic Modulation Recognition Using Convolutional Neural Network
    Jeong, Sinjin
    Lee, Uhyeon
    Kim, Suk Chan
    2018 TENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN 2018), 2018, : 843 - 845
  • [4] Feature-based detection of breast cancer using convolutional neural network and feature engineering
    Hiba Allah Essa
    Ebrahim Ismaiel
    Mhd Firas Al Hinnawi
    Scientific Reports, 14 (1)
  • [5] Anatomical Feature-Based Lung Ultrasound Image Quality Assessment Using Deep Convolutional Neural Network
    Ravishankar, Surya M.
    Tsumura, Ryosuke
    Hardin, John W.
    Hoffmann, Beatrice
    Zhang, Ziming
    Zhang, Haichong K.
    INTERNATIONAL ULTRASONICS SYMPOSIUM (IEEE IUS 2021), 2021,
  • [6] HRRP target recognition method based on bispectrum-spectrogram feature and deep convolutional neural network
    Lu W.
    Zhang Y.
    Xu C.
    Lin C.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (08): : 1703 - 1709
  • [7] Cough Recognition Based on Mel-Spectrogram and Convolutional Neural Network
    Zhou, Quan
    Shan, Jianhua
    Ding, Wenlong
    Wang, Chengyin
    Yuan, Shi
    Sun, Fuchun
    Li, Haiyuan
    Fang, Bin
    FRONTIERS IN ROBOTICS AND AI, 2021, 8
  • [8] A feature-based convolutional neural network for reconstruction of interventional MRI
    Zufiria, Blanca
    Qiu, Suhao
    Yan, Kang
    Zhao, Ruiyang
    Wang, Runke
    She, Huajun
    Zhang, Chengcheng
    Sun, Bomin
    Herman, Pawel
    Du, Yiping
    Feng, Yuan
    NMR IN BIOMEDICINE, 2022, 35 (04)
  • [9] Improved convolutional neural network and spectrogram image feature for traffic sound event classification
    Xu, Ke
    Yao, Jingyi
    Yao, Lingyun
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART D-JOURNAL OF AUTOMOBILE ENGINEERING, 2023, 238 (13) : 4230 - 4244
  • [10] A Spectrogram Image-Based Network Anomaly Detection System Using Deep Convolutional Neural Network
    Khan, Adnan Shahid
    Ahmad, Zeeshan
    Abdullah, Johari
    Ahmad, Farhan
    IEEE ACCESS, 2021, 9 : 87079 - 87093