Listening with generative models

被引:0
|
作者
Cusimano, Maddie [1 ]
Hewitt, Luke B. [1 ]
McDermott, Josh H. [1 ,2 ,3 ,4 ]
机构
[1] MIT, Dept Brain & Cognit Sci, Cambridge, MA 02139 USA
[2] MIT, McGovern Inst, Cambridge, MA USA
[3] MIT, Ctr Brains Minds & Machines, Cambridge, MA USA
[4] Harvard Univ, Speech & Hearing Biosci & Technol Program, Cambridge, MA USA
关键词
Auditory scene analysis; Bayesian inference; Illusions; Grouping; Perceptual organization; Natural sounds; Probabilistic program; World model; Perception; COCKTAIL PARTY; GESTALT PSYCHOLOGY; NEWBORN-INFANTS; SOUND SOURCES; PERCEPTION; SPEECH; SEPARATION; ORGANIZATION; STATISTICS; STREAM;
D O I
10.1016/j.cognition.2024.105874
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Perception has long been envisioned to use an internal model of the world to explain the causes of sensory signals. However, such accounts have historically not been testable, typically requiring intractable search through the space of possible explanations. Using auditory scenes as a case study, we leveraged contemporary computational tools to infer explanations of sounds in a candidate internal generative model of the auditory world (ecologically inspired audio synthesizers). Model inferences accounted for many classic illusions. Unlike traditional accounts of auditory illusions, the model is applicable to any sound, and exhibited human-like perceptual organization for real-world sound mixtures. The combination of stimulus-computability and interpretable model structure enabled 'rich falsification', revealing additional assumptions about sound generation needed to account for perception. The results show how generative models can account for the perception of both classic illusions and everyday sensory signals, and illustrate the opportunities and challenges involved in incorporating them into theories of perception.
引用
收藏
页数:64
相关论文
共 50 条
  • [1] HEARING IS NOT LISTENING: GENERATIVE UNDERSTANDING OF LANGUAGE
    Arteaga, Jhonan Jose Escalona
    DIALOGICA, 2022, 19 (02): : 153 - 163
  • [2] Generative Models
    Sim-Hui Tee
    Erkenntnis, 2023, 88 : 23 - 41
  • [3] Generative Models
    Tee, Sim-Hui
    ERKENNTNIS, 2023, 88 (01) : 23 - 41
  • [4] Diversity in Deep Generative Models and Generative AI
    Turinici, Gabriel
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT II, 2024, 14506 : 84 - 93
  • [5] Generative sound models
    Wyse, L
    11TH INTERNATIONAL MULTIMEDIA MODELLING CONFERENCE, PROCEEDINGS, 2005, : 370 - 377
  • [6] Capsule Generative Models
    Li, Yifeng
    Zhu, Xiaodan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 281 - 295
  • [7] Boosted Generative Models
    Grover, Aditya
    Ermon, Stefano
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3077 - 3084
  • [8] A generative AI-driven interactive listening assessment task
    Runge, Andrew
    Attali, Yigal
    Laflair, Geoffrey T.
    Park, Yena
    Church, Jacqueline
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [9] Building Intellectual Capital by Generative Listening and Learning From the Future
    Kaiser, Alexander
    Kragulj, Florian
    PROCEEDINGS OF THE 7TH EUROPEAN CONFERENCE ON INTELLECTUAL CAPITAL (ECIC 2015), 2015, : 165 - 172
  • [10] Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models
    Chen, Changyu
    Bose, Avinandan
    Cheng, Shih-Fen
    Sinha, Arunesh
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6193 - 6201