Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: The case of bonobo calls

被引：6

作者：

Arnaud, Vincent ^{[1
,2
]}

Pellegrino, Francois ^{[2
]}

Keenan, Sumir ^{[3
]}

St-Gelais, Xavier ^{[1
]}

Mathevon, Nicolas ^{[3
]}

Levrero, Florence ^{[3
]}

Coupe, Christophe ^{[2
,4
]}

机构：

[1] Univ Quebec Chicoutimi, Dept Arts Lettres & Langage, Chicoutimi, PQ, Canada

[2] Univ Lyon, Lab Dynam Langage, UMR 5596, CNRS, Lyon, France

[3] Univ St Etienne, ENES,Bioacoust Res Lab, CNRS,UMR 5292, CRNL,Inserm,UMR S 1028, St Etienne, France

[4] Univ Hong Kong, Dept Linguist, Hong Kong, Peoples R China

来源：

PLOS COMPUTATIONAL BIOLOGY | 2023年 / 19卷 / 04期

关键词：

CHIMPANZEES PAN-TROGLODYTES; VOCAL REPERTOIRE; ACOUSTIC FEATURES; CLASSIFICATION; SIGNATURE; IDENTITY; INDIVIDUALITY; VOCALIZATIONS; COMMUNICATION; INFORMATION;

D O I：

10.1371/journal.pcbi.1010325

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Despite the accumulation of data and studies, deciphering animal vocal communication remains challenging. In most cases, researchers must deal with the sparse recordings composing Small, Unbalanced, Noisy, but Genuine (SUNG) datasets. SUNG datasets are characterized by a limited number of recordings, most often noisy, and unbalanced in number between the individuals or categories of vocalizations. SUNG datasets therefore offer a valuable but inevitably distorted vision of communication systems. Adopting the best practices in their analysis is essential to effectively extract the available information and draw reliable conclusions. Here we show that the most recent advances in machine learning applied to a SUNG dataset succeed in unraveling the complex vocal repertoire of the bonobo, and we propose a workflow that can be effective with other animal species. We implement acoustic parameterization in three feature spaces and run a Supervised Uniform Manifold Approximation and Projection (S-UMAP) to evaluate how call types and individual signatures cluster in the bonobo acoustic space. We then implement three classification algorithms (Support Vector Machine, xgboost, neural networks) and their combination to explore the structure and variability of bonobo calls, as well as the robustness of the individual signature they encode. We underscore how classification performance is affected by the feature set and identify the most informative features. In addition, we highlight the need to address data leakage in the evaluation of classification performance to avoid misleading interpretations. Our results lead to identifying several practical approaches that are generalizable to any other animal communication system. To improve the reliability and replicability of vocal communication studies with SUNG datasets, we thus recommend: i) comparing several acoustic parameterizations; ii) visualizing the dataset with supervised UMAP to examine the species acoustic space; iii) adopting Support Vector Machines as the baseline classification approach; iv) explicitly evaluating data leakage and possibly implementing a mitigation strategy. Author summaryDeciphering animal vocal communication is a great challenge in most species. Audio recordings of vocal interactions help to understand what animals are saying to whom and when, but scientists are often faced with data collections characterized by a limited number of recordings, mostly noisy, and unbalanced in numbers between individuals or vocalization categories. Such datasets are far from perfect, but they are our best chance to understand communication in hard-to-record species. Opportunities may especially be limited to record endangered species such as our closest relatives, bonobos and chimpanzees. We propose an efficient workflow to analyze such imperfect datasets using recent methods developed in machine learning. We detail how this approach works and its performance in unraveling the complex vocal repertoire of the bonobo. Our results lead to the identification of several practical approaches that are generalizable to other animal communication systems. Finally, we make methodological recommendations to improve the reliability and reproducibility of vocal communication studies with these imperfect datasets that we call SUNG (Small, Unbalanced, Noisy, but Genuine datasets).

引用

页数：47