机构:
Xiamen Univ, MOE, WISE, Key Lab Econometr, Xiamen, Peoples R China
Xiamen Univ, Dept Stat & Data Sci, SOE, Xiamen, Peoples R ChinaXiamen Univ, MOE, WISE, Key Lab Econometr, Xiamen, Peoples R China
Zhong, Wei
[1
,2
]
Li, Zhuoxi
论文数: 0引用数: 0
h-index: 0
机构:
Xiamen Univ, Dept Stat & Data Sci, SOE, Xiamen, Peoples R ChinaXiamen Univ, MOE, WISE, Key Lab Econometr, Xiamen, Peoples R China
Li, Zhuoxi
[2
]
Guo, Wenwen
论文数: 0引用数: 0
h-index: 0
机构:
Capital Normal Univ, Sch Math Sci, Beijing, Peoples R ChinaXiamen Univ, MOE, WISE, Key Lab Econometr, Xiamen, Peoples R China
Guo, Wenwen
[3
]
Cui, Hengjian
论文数: 0引用数: 0
h-index: 0
机构:
Capital Normal Univ, Sch Math Sci, Beijing, Peoples R ChinaXiamen Univ, MOE, WISE, Key Lab Econometr, Xiamen, Peoples R China
Cui, Hengjian
[3
]
机构:
[1] Xiamen Univ, MOE, WISE, Key Lab Econometr, Xiamen, Peoples R China
[2] Xiamen Univ, Dept Stat & Data Sci, SOE, Xiamen, Peoples R China
[3] Capital Normal Univ, Sch Math Sci, Beijing, Peoples R China
Groupwise variable screening;
High dimensionality;
Measures of dependence;
Test of independence;
SELECTION;
MODELS;
ASSOCIATION;
DENSITY;
D O I:
10.1080/01621459.2023.2284988
中图分类号:
O21 [概率论与数理统计];
C8 [统计学];
学科分类号:
020208 ;
070103 ;
0714 ;
摘要:
We propose a new measure of dependence between a categorical random variable and a random vector with potentially high dimensions, named semi-distance correlation. It is an interesting extension of distance correlation to accommodate the information of the categorical random variable. It equals zero if and only if the categorical random variable and the other random vector are independent. Two important applications of semi-distance correlation are considered. First, we develop a semi-distance independence test between a categorical random variable and a random vector and derive its asymptotic distributions. When the dimension of the random vector tends to infinity, we derive the explicit asymptotic normal distribution of the test statistic under the null hypothesis, which allows us to compute p-values in an efficient and fast way for high dimensional data. Second, we propose to use the semi-distance correlation as a marginal utility between the response and a group of covariates to do groupwise variable screening for ultrahigh dimensional classification problems. The sure screening property has also been established. Monte Carlo simulations and a real data application are presented to demonstrate the excellent finite sample property of the proposed procedures. A new R package semidist is also developed to implement the proposed methods. Supplementary materials for this article are available online.
机构:
Columbia Univ, Dept Stat, 1255 Amsterdam Ave, New York, NY 10027 USAColumbia Univ, Dept Stat, 1255 Amsterdam Ave, New York, NY 10027 USA
Davis, Richard A.
Matsui, Muneya
论文数: 0引用数: 0
h-index: 0
机构:
Nanzan Univ, Dept Business Adm, Showa Ku, 18 Yamazato Cho, Nagoya, Aichi 4668673, JapanColumbia Univ, Dept Stat, 1255 Amsterdam Ave, New York, NY 10027 USA
Matsui, Muneya
Mikosch, Thomas
论文数: 0引用数: 0
h-index: 0
机构:
Univ Copenhagen, Dept Math, Univ Pk 5, DK-2100 Copenhagen, DenmarkColumbia Univ, Dept Stat, 1255 Amsterdam Ave, New York, NY 10027 USA
Mikosch, Thomas
Wan, Phyllis
论文数: 0引用数: 0
h-index: 0
机构:
Columbia Univ, Dept Stat, 1255 Amsterdam Ave, New York, NY 10027 USAColumbia Univ, Dept Stat, 1255 Amsterdam Ave, New York, NY 10027 USA