TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library

被引:2
|
作者
Kinahan, Sean P. [1 ,2 ]
Liss, Julie M. [1 ]
Berisha, Visar [1 ,2 ]
机构
[1] Arizona State Univ, Coll Hlth Solut, Tempe, AZ 85281 USA
[2] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA
来源
PLOS ONE | 2023年 / 18卷 / 02期
关键词
D O I
10.1371/journal.pone.0281306
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning tools which are freely available in the Python ecosystem that cannot be easily integrated with DIVA. We present TorchDIVA, a full rebuild of DIVA in Python using PyTorch tensors. DIVA source code was directly translated from Matlab to Python, and built-in Simulink signal blocks were implemented from scratch. After implementation, the accuracy of each module was evaluated via systematic block-by-block validation. The TorchDIVA model is shown to produce outputs that closely match those of the original DIVA model, with a negligible difference between the two. We additionally present an example of the extensibility of TorchDIVA as a research platform. Speech quality enhancement in TorchDIVA is achieved through an integration with an existing PyTorch generative vocoder called DiffWave. A modified DiffWave mel-spectrum upsampler was trained on human speech waveforms and conditioned on the TorchDIVA speech production. The results indicate improved speech quality metrics in the DiffWave-enhanced output as compared to the baseline. This enhancement would have been difficult or impossible to accomplish in the original Matlab implementation. This proof-of-concept demonstrates the value TorchDIVA can bring to the research community. Researchers can download the new implementation at: https://github.com/skinahan/DIVA_PyTorch.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] OATutor: An Open-source Adaptive Tutoring System and Curated Content Library for Learning Sciences Research
    Pardos, Zachary A.
    Tang, Matthew
    Anastasopoulos, Ioannis
    Sheel, Shreya K.
    Zhang, Ethan
    [J]. PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2023), 2023,
  • [42] problexity-An open-source Python']Python library for supervised learning problem complexity assessment
    Komorniczak, Joanna
    Ksieniewicz, Pawel
    [J]. NEUROCOMPUTING, 2023, 521 : 126 - 136
  • [43] CircuitNet: an open-source dataset for machine learning applications in electronic design automation (EDA)
    Chai, Zhuomin
    Zhao, Yuxiang
    Lin, Yibo
    Liu, Wei
    Wang, Runsheng
    Huang, Ru
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (12)
  • [44] CODA: an open-source platform for federated analysis and machine learning on distributed healthcare data
    Mullie, Louis
    Afilalo, Jonathan
    Archambault, Patrick
    Bouchakri, Rima
    Brown, Kip
    Buckeridge, David L.
    Cavayas, Yiorgos Alexandros
    Turgeon, Alexis F.
    Martineau, Denis
    Lamontagne, Francois
    Lebrasseur, Martine
    Lemieux, Renald
    Li, Jeffrey
    Sauthier, Michael
    St-Onge, Pascal
    Tang, An
    Witteman, William
    Chasse, Michael
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (03) : 651 - 665
  • [45] Open-source machine learning BANTER acoustic classification of beaked whale echolocation pulses
    Rankin, Shannon
    Sakai, Taiki
    Archer, Frederick I.
    Barlow, Jay
    Cholewiak, Danielle
    Deangelis, Annamaria I.
    Mccullough, Jennifer L. K.
    Oleson, Erin M.
    Simonis, Anne E.
    Soldevilla, Melissa S.
    Trickey, Jennifer S.
    [J]. ECOLOGICAL INFORMATICS, 2024, 80
  • [46] CircuitNet: an open-source dataset for machine learning applications in electronic design automation(EDA)
    Zhuomin CHAI
    Yuxiang ZHAO
    Yibo LIN
    Wei LIU
    Runsheng WANG
    Ru HUANG
    [J]. Science China(Information Sciences), 2022, 65 (12) : 313 - 314
  • [47] Open-source QSAR models for pKa prediction using multiple machine learning approaches
    Mansouri, Kamel
    Cariello, Neal F.
    Korotcov, Alexandru
    Tkachenko, Valery
    Grulke, Chris M.
    Sprankle, Catherine S.
    Allen, David
    Casey, Warren M.
    Kleinstreuer, Nicole C.
    Williams, Antony J.
    [J]. JOURNAL OF CHEMINFORMATICS, 2019, 11 (01)
  • [48] Open-source QSAR models for pKa prediction using multiple machine learning approaches
    Kamel Mansouri
    Neal F. Cariello
    Alexandru Korotcov
    Valery Tkachenko
    Chris M. Grulke
    Catherine S. Sprankle
    David Allen
    Warren M. Casey
    Nicole C. Kleinstreuer
    Antony J. Williams
    [J]. Journal of Cheminformatics, 11
  • [49] SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence
    Fang, Wei
    Chen, Yanqi
    Ding, Jianhao
    Yu, Zhaofei
    Masquelier, Timothee
    Chen, Ding
    Huang, Liwei
    Zhou, Huihui
    Li, Guoqi
    Tian, Yonghong
    [J]. SCIENCE ADVANCES, 2023, 9 (40):
  • [50] Open-Source Clinical Machine Learning Models: Critical Appraisal of Feasibility, Advantages, and Challenges
    Harish, Keerthi B.
    Price, W. Nicholson
    Aphinyanaphongs, Yindalon
    [J]. JMIR FORMATIVE RESEARCH, 2022, 6 (04)