Building a speech recognition system with privacy identification information based on Google Voice for social robots

被引:0
|
作者
Pei-Chun Lin
Benjamin Yankson
Vishal Chauhan
Manabu Tsukada
机构
[1] Feng Chia University,Faculty of Information Engineering and Computer Science
[2] University at Albany,CEHC
[3] State University of New York,Graduate School of Information Science and Technology
[4] The University of Tokyo,undefined
来源
关键词
Google AIY Voice Kit; Speech recognition; Personal identification information; Artificial intelligent; Social robots; Robot computing; Smart speaker; Google assistant;
D O I
暂无
中图分类号
学科分类号
摘要
Currently, many smart speakers, even social robots, appear on the market to help people's lives become more convenient. Usually, people use smart speakers to check their daily schedule or control home appliances in their house. Many social robots also include smart speakers. They have the common property of being used in voice control machines. Regardless of where the smart speaker is installed and used, when people start a conversation with voice equipment, a security or privacy risk is exposed. Hence, we want to build a speech recognition (SR) that contains the privacy identification information (PII) system in this paper. We call this the SR-PII system. We used a Google Artificial-Intelligence-Yourself (AIY) Voice Kit released from Google to build a simple, smart dialog speaker and included our SR-PII system. In our experiments, we test SR accuracy and the reliability of privacy settings in three environments (quiet, noise, and playing music). We also examine the cloud response and speaker response times during our experiments. The results show that the speaker response is approximately 3.74 s in the cloud environment and approximately 9.04 s from the speaker. We also showed the response accuracy of the speaker, which successfully prevented personal information with the SR-PII system in three environments. The speaker has a response mean time of approximately 8.86 s with 93% mean accuracy in a quiet room, approximately 9.18 s with 89% mean accuracy in a noisy environment, and approximately 9.62 s with 90% mean accuracy in an environment that plays music. We conclude that the SR-PII system can secure private information and that the most important factor affecting the response speed of the speaker is the network connection status. We hope that people can, through our experiments, have some guidelines in building social robots and installing the SR-PII system to protect users’ personal identification information.
引用
收藏
页码:15060 / 15088
页数:28
相关论文
共 50 条
  • [41] Low band continuous speech system for voice pathologies identification
    Cordeiro, Hugo
    Meneses, Carlos
    [J]. 2018 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2018, : 315 - 320
  • [42] Design and research of multimedia information publishing system based on speech recognition technology
    Li, Zhuoran
    Wang, Yafei
    Wang, Cong
    [J]. OPTICAL AND QUANTUM ELECTRONICS, 2024, 56 (03)
  • [43] Voice Based Emotion Recognition with Convolutional Neural Networks for Companion Robots
    Franti, Eduard
    Ispas, Ioan
    Dragomir, Voichita
    Dascalu, Monica
    Zoltan, Elteto
    Stoica, Ioan Cristian
    [J]. ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2017, 20 (03): : 222 - +
  • [44] Convolution Neural Network Based Visual Speech Recognition System for Syllable Identification
    Pahuja, Hunny
    Ranjan, Priya
    Ujlayan, Amit
    Goyal, Ayush
    [J]. Recent Advances in Computer Science and Communications, 2022, 15 (01) : 139 - 150
  • [45] Deploying a speech-based information system as a research platform for speech recognition research in real environments
    Nishimura, R
    Nishihara, Y
    Tsurumi, R
    Lee, A
    Saruwatari, H
    Shikano, K
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2005, 88 (12): : 43 - 54
  • [46] AN IMPLEMENTATION OF VOICE CONTROL SYSTEM BY USING CLOUD SPEECH RECOGNITION SERVICES
    Lee, Chiung-Hon Leon
    Lee, Chengzhe
    Cheng, I-Jing
    [J]. 4TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGY AND ENGINEERING (ICSTE 2012), 2012, : 577 - 581
  • [47] Fast Speech Recognition for Voice Destination Entry in a Car Navigation System
    Chung, Hoon
    Park, JeonGue
    Jeon, HyeonBae
    Lee, YunKeun
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 979 - 982
  • [48] Performance Prediction of Speech Recognition Using Average-Voice-Based Speech Synthesis
    Saito, Tatsuhiko
    Nose, Takashi
    Kobayashi, Takao
    Okato, Yohei
    Horii, Akio
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1964 - +
  • [49] Development of Web-Based Voice Interface to Identify Child Users Based on Automatic Speech Recognition System
    Nisimura, Ryuichi
    Miyamori, Shoko
    Kurihara, Lisa
    Kawahara, Hideki
    Irino, Toshio
    [J]. HUMAN-COMPUTER INTERACTION: USERS AND APPLICATIONS, PT IV, 2011, 6764 : 607 - 616
  • [50] A speech recognition and speech corpus system based on Matlab
    He, Q
    Zhang, YW
    [J]. PROCEEDINGS OF 2001 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2001, : 559 - 562