Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis

被引：6

作者：

Braca, Paolo ^{[1
]}

Millefiori, Leonardo M. ^{[1
]}

Aubry, Augusto ^{[2
]}

Marano, Stefano ^{[3
]}

De Maio, Antonio ^{[2
]}

Willett, Peter ^{[4
]}

机构：

[1] Ctr Maritime Res & Experimentat, Res Dept, I-19126 La Spezia, SP, Italy

[2] Univ Naples Federico II, DIETI, I-80125 Naples, NA, Italy

[3] Univ Salerno, DIEM, I-84084 Fisciano, SA, Italy

[4] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA

来源：

IEEE OPEN JOURNAL OF SIGNAL PROCESSING | 2022年 / 3卷

关键词：

Error probability; Training; Artificial intelligence; Convergence; Error analysis; Surveillance; Signal processing; Machine learning; deep learning; large deviations principle; exact asymptotics; statistical hypothesis testing; Fenchel-Legendre transform; extended target detection; radar; sonar detection; X-band maritime radar; EXTENDED TARGET TRACKING; DISTRIBUTED DETECTION; ARTIFICIAL-INTELLIGENCE; MARITIME SURVEILLANCE; MULTIPLE SENSORS; NEURAL-NETWORK; DEEP; CLASSIFICATION; ALGORITHMS; CONSENSUS;

D O I：

10.1109/OJSP.2022.3232284

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We study the performance of Machine Learning (ML) classification techniques. Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say exp(-n I), where n is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and I is the error rate. Such conditions depend on the Fenchel-Legendre transform of the cumulant-generating function of the Data-Driven Decision Function (D3F, i.e., what is thresholded before the final binary decision is made) learned in the training phase. As such, the D3F and the related error rate I depend on the given training set. The conditions for the exponential convergence can be verified and tested numerically exploiting the available dataset or a synthetic dataset generated according to the underlying statistical model. Coherently with the large deviations theory, we can also establish the convergence of the normalized D3F statistic to a Gaussian distribution. Furthermore, approximate error probability curves zeta(n) exp(-n I) are provided, thanks to the refined asymptotic derivation, where zeta n represents the most representative sub-exponential terms of the error probabilities. Leveraging the refined asymptotic, we are able to compute an accurate analytical approximation of the classification performance for both the regimes of small and large values of n. Theoretical findings are corroborated by extensive numerical simulations and by the use of real-world data, acquired by an X-band maritime radar system for surveillance.

引用

页码：464 / 495

页数：32

共 50 条

[1] Machine Learning-Based Statistical Hypothesis Testing for Fault Detection
Fazai, Radhia
Mansouri, Majdi
Abodayeh, Kamal
Trabelsi, Mohamed
Nounou, Hazem
Nounou, Mohamed
2019 4TH CONFERENCE ON CONTROL AND FAULT TOLERANT SYSTEMS (SYSTOL), 2019, : 38 - 43
[2] Machine learning-based statistical testing hypothesis for fault detection in photovoltaic systems
Fazai, R.
Abodayeh, K.
Mansouri, M.
Trabelsi, M.
Nounou, H.
Nounou, M.
Georghiou, G. E.
SOLAR ENERGY, 2019, 190 : 405 - 413
[3] Strong Large Deviations for Composite Hypothesis Testing
Huang, Yen-Wei
Moulin, Pierre
2014 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2014, : 556 - 560
[4] Hypothesis testing for signal detection problem and large deviations
Chiyonobu, T
NAGOYA MATHEMATICAL JOURNAL, 2001, 162 : 187 - 203
[5] Moderate Deviations Analysis of Binary Hypothesis Testing
Sason, Igal
2012 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2012, : 821 - 825
[6] Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines
Li, Jingyi Jessica
Tong, Xin
PATTERNS, 2020, 1 (07):
[7] Bayesian Hypothesis Testing in Machine Learning
Corani, Giorgio
Benavoli, Alessio
Mangili, Francesca
Zaffalon, Marco
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2015, 9286 : 199 - 202
[8] Large Deviation Analysis for Learning Rate in Distributed Hypothesis Testing
Lalitha, Anusha
Javidi, Tara
2015 49TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2015, : 1065 - 1069
[9] Hypothesis testing and statistical analysis of microbiome
Xia, Yinglin
Sun, Jun
GENES & DISEASES, 2017, 4 (03) : 138 - 148
[10] The embedded biases in hypothesis testing and machine learning
Habboub, Ghaith
Grabowski, Matthew M.
Kelly, Michael L.
Benzel, Edward C.
NEUROSURGICAL FOCUS, 2020, 48 (05)

← 1 2 3 4 5 →