Interrater sleep stage scoring reliability between manual scoring from two European sleep centers and automatic scoring performed by the artificial intelligence-based Stanford-STAGES algorithm

被引:27
|
作者
Cesari, Matteo [1 ]
Stefani, Ambra [1 ]
Penzel, Thomas [2 ,3 ]
Ibrahim, Abubaker [1 ]
Hackner, Heinz [1 ]
Heidbreder, Anna [1 ]
Szentkiralyi, Andras [4 ]
Stubbe, Beate [5 ]
Voelzke, Henry [6 ]
Berger, Klaus [4 ]
Hoegl, Birgit [1 ]
机构
[1] Med Univ Innsbruck, Dept Neurol, Anichstr 35, A-6020 Innsbruck, Austria
[2] Charite Univ Med Berlin, Interdisciplinary Sleep Med Ctr, Berlin, Germany
[3] Saratov NG Chernyshevskii State Univ, Saratov, Russia
[4] Univ Munster, Inst Epidemiol & Social Med, Munster, Germany
[5] Univ Med Greifswald, Dept Internal Med B, Greifswald, Germany
[6] Univ Med Greifswald, Inst Community Med, Greifswald, Germany
来源
JOURNAL OF CLINICAL SLEEP MEDICINE | 2021年 / 17卷 / 06期
关键词
automatic scoring; deep neural networks; computerized analysis; interrater variability; study of health in Pomerania; slow wave activity; AGREEMENT; POLYSOMNOGRAMS; FRAGMENTATION; DIAGNOSIS; UTILITY;
D O I
10.5664/jcsm.9174
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Study Objectives: The objective of this study was to evaluate interrater reliability between manual sleep stage scoring performed in 2 European sleep centers and automatic sleep stage scoring performed by the previously validated artificial intelligence-based Stanford-STAGES algorithm. Methods: Full night polysomnographies of 1,066 participants were included. Sleep stages were manually scored in Berlin and Innsbruck sleep centers and automatically scored with the Stanford-STAGES algorithm. For each participant, we compared (1) Innsbruck to Berlin scorings (INN vs BER); (2) Innsbruck to automatic scorings (INN vs AUTO); (3) Berlin to automatic scorings (BER vs AUTO); (4) epochs where scorers from Innsbruck and Berlin had consensus to automatic scoring (CONS vs AUTO); and (5) both Innsbruck and Berlin manual scorings (MAN) to the automatic ones (MAN vs AUTO). Interrater reliability was evaluated with several measures, including overall and sleep stage-specific Cohen's kappa. Results: Overall agreement across participants was substantial for INN vs BER (kappa = 0.66 +/- 0.13), INN vs AUTO (kappa = 0.68 +/- 0.14), CONS vs AUTO (kappa = 0.73 +/- 0.14), and MAN vs AUTO (kappa = 0.61 +/- 0.14), and moderate for BER vs AUTO (kappa = 0.55 +/- 0.15). Human scorers had the highest disagreement for N1 sleep (kappa(N1) = 0.40 +/- 0.16 for INN vs BER). Automatic scoring had lowest agreement with manual scorings for N1 and N3 sleep (kappa(N1) = 0.25 +/- 0.14 and kappa(N3) = 0.42 +/- 0.32 for MAN vs AUTO). Conclusions: Interrater reliability for sleep stage scoring between human scorers was in line with previous findings, and the algorithm achieved an overall substantial agreement with manual scoring. In this cohort, the Stanford-STAGES algorithm showed similar performances to the ones achieved in the original study, suggesting that it is generalizable to new cohorts. Before its integration in clinical practice, future independent studies should further evaluate it in other cohorts.
引用
收藏
页码:1237 / 1247
页数:11
相关论文
共 8 条
  • [1] Inter-rater sleep stage scoring reliability between two sleep centres and an automated artificial-intelligence algorithm
    Cesari, M.
    Stefani, A.
    Penzel, T.
    Ibrahim, A.
    Hackner, H.
    Heidbreder, A.
    Szentkiralyi, A.
    Stubbe, B.
    Voelzke, H.
    Berger, K.
    Hoegl, B.
    [J]. EUROPEAN JOURNAL OF NEUROLOGY, 2021, 28 : 193 - 193
  • [2] INTERRATER RELIABILITY FOR SLEEP STAGE SCORING FROM ELEVEN JAPANESE LABORATORIES
    Yagi, T.
    Chiba, S.
    Itoh, H.
    Ozone, M.
    [J]. SLEEP, 2017, 40 : A286 - A287
  • [3] Automatic scoring of sleep stages with artificial intelligence and its use for differentiation of disorders of hypersomnolence
    Cesari, M.
    Egger, K.
    Stefani, A.
    Bergmann, M.
    Ibrahim, A.
    Brandauer, E.
    Hoegl, B.
    Heidbreder, A.
    [J]. JOURNAL OF SLEEP RESEARCH, 2022, 31
  • [4] Reliability of Family Dogs' Sleep Structure Scoring Based on Manual and Automated Sleep Stage Identification
    Gergely, Anna
    Kiss, Orsolya
    Reicher, Vivien
    Iotchev, Ivaylo
    Kovacs, Eniko
    Gombos, Ferenc
    Benczur, Andras
    Galambos, Agoston
    Topal, Jozsef
    Kis, Anna
    [J]. ANIMALS, 2020, 10 (06):
  • [5] Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring
    Bakker, Jessie P.
    Ross, Marco
    Cerny, Andreas
    Vasko, Ray
    Shaw, Edmund
    Kuna, Samuel
    Magalang, Ulysses J.
    Punjabi, Naresh M.
    Anderer, Peter
    [J]. SLEEP, 2023, 46 (02)
  • [6] THE HYPNODENSITY GRAPH: A NEW REPRESENTATION OF SLEEP SCORING BASED ON MULTIPLE MANUAL EXPERT SCORINGS AND ESTIMATED BY ARTIFICIAL INTELLIGENCE
    Anderer, P.
    Ross, M.
    Cerny, A.
    Moreau, A.
    [J]. SLEEP MEDICINE, 2019, 64 : S15 - S16
  • [7] Concordance between manual pathologist scoring and an Artificial Intelligence Deep Learning-based algorithm for Ki-67 immunohistochemical scoring in breast cancer
    Badr, N. M.
    Ramsing, T. W.
    Overgaard, A.
    Thagaard, J.
    Omanovic, D.
    Miligy, I.
    Hunter, K.
    Kearns, D.
    Shaaban, A.
    [J]. EUROPEAN JOURNAL OF CANCER, 2022, 175 : S59 - S59
  • [8] VALIDATION STUDIES FOR SCORING POLYSOMNOGRAMS AND HOME SLEEP APNEA TESTS WITH ARTIFICIAL INTELLIGENCE: SLEEP STAGE PROBABILITIES (HYPNODENSITY) DERIVED FROM NEUROLOGICAL OR CARDIORESPIRATORY SIGNALS
    Anderer, Peter
    Ross, Marco
    Cerny, Andreas
    Fonseca, Pedro
    Shaw, Edmund
    Bakker, Jessie
    [J]. SLEEP, 2022, 45 : A319 - A319