simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods

被引:0
|
作者
Kanduri, Chakravarthi [1 ,2 ]
Scheffer, Lonneke [1 ]
Pavlovic, Milena [1 ,2 ]
Rand, Knut Dagestad [1 ]
Chernigovskaya, Maria [1 ,3 ,4 ]
Pirvandy, Oz [5 ]
Yaari, Gur [5 ]
Greiff, Victor [3 ,4 ]
Sandve, Geir K. [1 ,2 ]
机构
[1] Univ Oslo, Ctr Bioinformat, Dept Informat, Oslo, Norway
[2] Univ Oslo, UiORealArt Convergence Environm, NO-0373 Oslo, Norway
[3] Univ Oslo, Dept Immunol, NO-0373 Oslo, Norway
[4] Univ Oslo, Oslo Univ Hosp, NO-0373 Oslo, Norway
[5] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel
来源
GIGASCIENCE | 2023年 / 12卷
关键词
simulation of AIRR data; shortcut learning; benchmarking of machine learning methods; adaptive immune receptor repertoires; AIRR; ML; T-CELL-RECEPTORS; MACHINE; GENERATION; DRIVEN; RESPONSES; FEATURES; SHAPES;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Machine learning (ML) has gained significant attention for classifying immune states in adaptive immune receptor repertoires (AIRRs) to support the advancement of immunodiagnostics and therapeutics. Simulated data are crucial for the rigorous benchmarking of AIRR-ML methods. Existing approaches to generating synthetic benchmarking datasets result in the generation of naive repertoires missing the key feature of many shared receptor sequences (selected for common antigens) found in antigen- experienced repertoires. Results: We demonstrate that a common approach to generating simulated AIRR benchmark datasets can introduce biases, which may be exploited for undesired shortcut learning by certain ML methods. To mitigate undesirable access to true signals in simulated AIRR datasets, we devised a simulation strategy (simAIRR) that constructs antigen-experienced-like repertoires with a realistic overlap of receptor sequences. simAIRR can be used for constructing AIRR-level benchmarks based on a range of assumptions (or experimental data sources) for what constitutes receptor-level immune signals. This includes the possibility of making or not making any prior assumptions regarding the similarity or commonality of immune state-associated sequences that will be used as true signals. We demonstrate the real-world realism of our proposed simulation approach by showing that basic ML strategies perform similarly on simAIRR-generated and real-world experimental AIRR datasets. Conclusions: This study sheds light on the potential shortcut learning opportunities for ML methods that can arise with the stateof-the-art way of simulating AIRR datasets. simAIRR is available as a Python package: https://github.com/KanduriC/simAIRR.
引用
收藏
页数:16
相关论文
共 20 条
  • [1] simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods
    Kanduri, Chakravarthi
    Scheffer, Lonneke
    Pavlovic, Milena
    Rand, Knut Dagestad
    Chernigovskaya, Maria
    Pirvandy, Oz
    Yaari, Gur
    Greiff, Victor
    Sandve, Geir K.
    GIGASCIENCE, 2023, 12
  • [2] Simulation of adaptive immune receptors and repertoires with complex immune information to guide the development and benchmarking of AIRR machine learning
    Chernigovskaya, Maria
    Pavlovic, Milena
    Kanduri, Chakravarthi
    Gielis, Sofie
    Robert, Philippe A.
    Scheffer, Lonneke
    Slabodkin, Andrei
    Haff, Ingrid Hobaek
    Meysman, Pieter
    Yaari, Gur
    Sandve, Geir Kjetil
    Greiff, Victor
    NUCLEIC ACIDS RESEARCH, 2025, 53 (03)
  • [3] Adaptive immune receptor repertoires, an overview of this exciting field
    Magadan, Susana
    IMMUNOLOGY LETTERS, 2020, 221 : 49 - 55
  • [4] Reference-based comparison of adaptive immune receptor repertoires
    Weber, Cedric R.
    Rubio, Teresa
    Wang, Longlong
    Zhang, Wei
    Robert, Philippe A.
    Akbar, Rahmad
    Snapkov, Igor
    Wu, Jinghua
    Kuijjer, Marieke L.
    Tarazona, Sonia
    Conesa, Ana
    Sandve, Geir K.
    Liu, Xiao
    Reddy, Sai T.
    Greiff, Victor
    CELL REPORTS METHODS, 2022, 2 (08):
  • [5] CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching
    Rognes, Torbjorn
    Scheffer, Lonneke
    Greiff, Victor
    Sandve, Geir Kjetil
    BIOINFORMATICS, 2022, 38 (17) : 4230 - 4232
  • [6] Investigating Adaptive Immune Receptor Repertoires by Deep Immune Cell Phenotyping in Preclinical Autoimmunity Development
    Bylinska, Aleksandra
    Smith, Miles
    Lu, Rufei
    Jones, Benjamin
    Guthridge, Carla
    Marlin, Matthew Caleb
    Wright, Christian
    Macwana, Susan
    DeJager, Wade
    Beel, Marci
    Lessard, Christopher
    Arriens, Cristina
    Merrill, Joan
    James, Judith
    Guthridge, Joel
    ARTHRITIS & RHEUMATOLOGY, 2024, 76 : 3685 - 3687
  • [7] The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires
    Pavlovic, Milena
    Scheffer, Lonneke
    Motwani, Keshav
    Kanduri, Chakravarthi
    Kompova, Radmila
    Vazov, Nikolay
    Waagan, Knut
    Bernal, Fabian L. M.
    Costa, Alexandre Almeida
    Corrie, Brian
    Akbar, Rahmad
    Al Hajj, Ghadi S.
    Balaban, Gabriel
    Brusko, Todd M.
    Chernigovskaya, Maria
    Christley, Scott
    Cowell, Lindsay G.
    Frank, Robert
    Grytten, Ivar
    Gundersen, Sveinung
    Haff, Ingrid Hobaek
    Hovig, Eivind
    Hsieh, Ping-Han
    Klambauer, Gunter
    Kuijjer, Marieke L.
    Lund-Andersen, Christin
    Martini, Antonio
    Minotto, Thomas
    Pensar, Johan
    Rand, Knut
    Riccardi, Enrico
    Robert, Philippe A.
    Rocha, Artur
    Slabodkin, Andrei
    Snapkov, Igor
    Sollid, Ludvig M.
    Titov, Dmytro
    Weber, Cedric R.
    Widrich, Michael
    Yaari, Gur
    Greiff, Victor
    Sandve, Geir Kjetil
    NATURE MACHINE INTELLIGENCE, 2021, 3 (11) : 936 - +
  • [8] The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires
    Milena Pavlović
    Lonneke Scheffer
    Keshav Motwani
    Chakravarthi Kanduri
    Radmila Kompova
    Nikolay Vazov
    Knut Waagan
    Fabian L. M. Bernal
    Alexandre Almeida Costa
    Brian Corrie
    Rahmad Akbar
    Ghadi S. Al Hajj
    Gabriel Balaban
    Todd M. Brusko
    Maria Chernigovskaya
    Scott Christley
    Lindsay G. Cowell
    Robert Frank
    Ivar Grytten
    Sveinung Gundersen
    Ingrid Hobæk Haff
    Eivind Hovig
    Ping-Han Hsieh
    Günter Klambauer
    Marieke L. Kuijjer
    Christin Lund-Andersen
    Antonio Martini
    Thomas Minotto
    Johan Pensar
    Knut Rand
    Enrico Riccardi
    Philippe A. Robert
    Artur Rocha
    Andrei Slabodkin
    Igor Snapkov
    Ludvig M. Sollid
    Dmytro Titov
    Cédric R. Weber
    Michael Widrich
    Gur Yaari
    Victor Greiff
    Geir Kjetil Sandve
    Nature Machine Intelligence, 2021, 3 : 936 - 944
  • [9] Development of adaptive immune cells and receptor repertoires from infancy to adulthood
    Trueck, Johannes
    van der Burg, Mirjam
    CURRENT OPINION IN SYSTEMS BIOLOGY, 2020, 24 : 51 - 55
  • [10] Using T Cell Receptor Repertoires to Understand the Principles of Adaptive Immune Recognition
    Bradley, Philip
    Thomas, Paul G.
    ANNUAL REVIEW OF IMMUNOLOGY, VOL 37, 2019, 2019, 37 : 547 - 570