Combining Data Augmentations for CNN-Based Voice Command Recognition

被引:7
|
作者
Azarang, Arian [1 ]
Hansen, John [1 ]
Kehtarnavaz, Nasser [1 ]
机构
[1] Univ Texas Dallas, Dept Elect & Comp Engn, Richardson, TX 75080 USA
关键词
Combining data augmentation methods for voice command recognition; CNN-based voice command recognition; voice command human interaction systems; CONVOLUTIONAL NEURAL-NETWORKS;
D O I
10.1109/hsi47298.2019.8942638
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents combining two data augmentation methods involving speed perturbation and room impulse response reverberation for the purpose of improving the generalization capability of convolutional neural networks when used for voice command recognition. Speed perturbation generates voice command variations caused by shorter or longer time durations of commands spoken by different speakers. Room impulse response reverberation generates voice command variations caused by reflected sound paths. The combination of these two augmentation methods is presented in this paper by examining a public domain dataset of voice commands. The experimental results based on the performance metric of word error rate indicate the improvement in voice command recognition rates when combining these data augmentation methods relative to using each augmentation method individually.
引用
收藏
页码:17 / 21
页数:5
相关论文
共 50 条
  • [21] Place and Object Recognition by CNN-Based COSFIRE Filters
    Lopez-Antequera, Manuel
    Vallina, Maria Leyva
    Strisciuglio, Nicola
    Petkov, Nicolai
    IEEE ACCESS, 2019, 7 : 66157 - 66166
  • [22] CNN-Based Smart Sleep Posture Recognition System
    Tang, Keison
    Kumar, Arjun
    Nadeem, Muhammad
    Maaz, Issam
    IOT, 2021, 2 (01): : 119 - 139
  • [23] PRATIT: a CNN-based emotion recognition system using histogram equalization and data augmentation
    Dhara Mungra
    Anjali Agrawal
    Priyanka Sharma
    Sudeep Tanwar
    Mohammad S. Obaidat
    Multimedia Tools and Applications, 2020, 79 : 2285 - 2307
  • [24] Similarity Learning for CNN-Based ASL Alphabet Recognition
    Fierro Radilla, Atoany Nazareth
    Perez Daniel, Karina Ruby
    Benitez-Garcia, Gibran
    Najera Garcia, Pedro
    Valdez, Ramona Fuentes
    NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2021, 337 : 633 - 645
  • [25] PRATIT: a CNN-based emotion recognition system using histogram equalization and data augmentation
    Mungra, Dhara
    Agrawal, Anjali
    Sharma, Priyanka
    Tanwar, Sudeep
    Obaidat, Mohammad S.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (3-4) : 2285 - 2307
  • [26] Beyond Human Recognition: A CNN-Based Framework for Handwritten Character Recognition
    Chen, Li
    Wang, Song
    Fan, Wei
    Sun, Jun
    Naoi, Satoshi
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 695 - 699
  • [27] CNN-Based Multimodal Human Recognition in Surveillance Environments
    Koo, Ja Hyung
    Cho, Se Woon
    Baek, Na Rae
    Kim, Min Cheol
    Park, Kang Ryoung
    SENSORS, 2018, 18 (09)
  • [28] Loss Functions for CNN-based Biometric Vein Recognition
    Kuzu, Ridvan Salih
    Maiorana, Emanuele
    Campisi, Patrizio
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 750 - 754
  • [29] Data Augmentation in CNN-based Periocular Authentication
    Dellana, Ryan
    Roy, Kaushik
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND MANAGEMENT (ICICM 2016), 2016, : 141 - 145
  • [30] Combining Genetic Algorithms and FLDR for Real-Time Voice Command Recognition
    Romo, Julio Cesar Martinez
    Rosas, Francisco Javier Luna
    Mora-Gonzalez, Miguel
    PROCEEDINGS OF THE SPECIAL SESSION OF THE SEVENTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE - MICAI 2008, 2008, : 163 - 169