Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis

被引:6
|
作者
Braca, Paolo [1 ]
Millefiori, Leonardo M. [1 ]
Aubry, Augusto [2 ]
Marano, Stefano [3 ]
De Maio, Antonio [2 ]
Willett, Peter [4 ]
机构
[1] Ctr Maritime Res & Experimentat, Res Dept, I-19126 La Spezia, SP, Italy
[2] Univ Naples Federico II, DIETI, I-80125 Naples, NA, Italy
[3] Univ Salerno, DIEM, I-84084 Fisciano, SA, Italy
[4] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA
关键词
Error probability; Training; Artificial intelligence; Convergence; Error analysis; Surveillance; Signal processing; Machine learning; deep learning; large deviations principle; exact asymptotics; statistical hypothesis testing; Fenchel-Legendre transform; extended target detection; radar; sonar detection; X-band maritime radar; EXTENDED TARGET TRACKING; DISTRIBUTED DETECTION; ARTIFICIAL-INTELLIGENCE; MARITIME SURVEILLANCE; MULTIPLE SENSORS; NEURAL-NETWORK; DEEP; CLASSIFICATION; ALGORITHMS; CONSENSUS;
D O I
10.1109/OJSP.2022.3232284
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We study the performance of Machine Learning (ML) classification techniques. Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say exp(-n I), where n is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and I is the error rate. Such conditions depend on the Fenchel-Legendre transform of the cumulant-generating function of the Data-Driven Decision Function (D3F, i.e., what is thresholded before the final binary decision is made) learned in the training phase. As such, the D3F and the related error rate I depend on the given training set. The conditions for the exponential convergence can be verified and tested numerically exploiting the available dataset or a synthetic dataset generated according to the underlying statistical model. Coherently with the large deviations theory, we can also establish the convergence of the normalized D3F statistic to a Gaussian distribution. Furthermore, approximate error probability curves zeta(n) exp(-n I) are provided, thanks to the refined asymptotic derivation, where zeta n represents the most representative sub-exponential terms of the error probabilities. Leveraging the refined asymptotic, we are able to compute an accurate analytical approximation of the classification performance for both the regimes of small and large values of n. Theoretical findings are corroborated by extensive numerical simulations and by the use of real-world data, acquired by an X-band maritime radar system for surveillance.
引用
收藏
页码:464 / 495
页数:32
相关论文
共 50 条
  • [31] STATISTICAL HYPOTHESIS-TESTING
    MARINO, RJ
    ARCHIVES OF PHYSICAL MEDICINE AND REHABILITATION, 1995, 76 (06): : 587 - 588
  • [32] STATISTICAL HYPOTHESIS TESTING IN EXCEL
    Hic, Pavel
    Pokorny, Milan
    APLIMAT 2009: 8TH INTERNATIONAL CONFERENCE, PROCEEDINGS, 2009, : 663 - 666
  • [33] Revisiting statistical hypothesis testing
    Newman, Michael C.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2009, 238 : 618 - 618
  • [34] Statistical inference: Hypothesis testing
    Exposito-Ruiz, M.
    Perez-Vicente, S.
    Rivas-Ruiz, F.
    ALLERGOLOGIA ET IMMUNOPATHOLOGIA, 2010, 38 (05) : 266 - 277
  • [35] Statistical and machine learning-based durability-testing strategies for energy storage
    Harris, Stephen J.
    Noack, Marcus M.
    JOULE, 2023, 7 (05) : 920 - 934
  • [36] HypoML: Visual Analysis for Hypothesis-based Evaluation of Machine Learning Models
    Wang, Qianwen
    Alexander, William
    Pegg, Jack
    Qu, Huamin
    Chen, Min
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 1417 - 1426
  • [37] A hypothesis-driven method based on machine learning for neuroimaging data analysis
    Gorriz, J. M.
    Martin-Clemente, R.
    Puntonet, C. G.
    Ortiz, A.
    Ramirez, J.
    Suckling, J.
    NEUROCOMPUTING, 2022, 510 : 159 - 171
  • [38] Pitfalls of statistical hypothesis testing: multiple testing
    Sedgwick, Philip
    BMJ-BRITISH MEDICAL JOURNAL, 2014, 349
  • [39] Scaling fluctuation analysis and statistical hypothesis testing of anthropogenic warming
    S. Lovejoy
    Climate Dynamics, 2014, 42 : 2339 - 2351
  • [40] Machine Learning Based Technique for Detecting Daily Routine and Deviations
    Chifu, Emil Stefan
    Chifu, Viorica Rozina
    Pop, Cristina Bianca
    Vlad, Alin
    Salomie, Ioan
    2018 IEEE 14TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2018, : 183 - 189