Testing Homogeneity Of A Large Data Set By Bootstrapping

被引：0

作者：

Morimune, K. ^{[1
]}

Hoshino, Y. ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Econ, Sakyo Ku, Kyoto 6068501, Japan

来源：

MODSIM 2005: INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION: ADVANCES AND APPLICATIONS FOR MANAGEMENT AND DECISION MAKING: ADVANCES AND APPLICATIONS FOR MANAGEMENT AND DECISION MAKING | 2005年

关键词：

Wu-Hausman test; Micro data; Bootstrapping; Sub-sample;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

It is not rare to analyze large data sets these days. Large data is usually of census type and is called the micro data in econometrics. The basic method of analysis is to estimate a single regression equation with common coefficients over the whole data. The same applies to other method of estimation such as the discrete choice models, Tobit models, and so on. Heterogeneity in the data is usually adjusted by the dummy variables. Dummy variables represent socioeconomic differences among individuals in the sample. Including the coefficients of dummy variables, only one equation is estimated for the whole large sample, and it is usually not preferred to divide the whole sample into sub-samples. Data is said to be homogenous in this paper if a single equation is fit to the whole data, and it explains socioeconomic properties of the data well. We may estimate an equation in each sub-population if the whole population is divided into known sub-populations. It is assumed that the coefficients are different from one sub-population to another in this case. Data is said to be heterogeneous in our paper. The analysis of variance is applied if sub-populations are known and sub-sample is collected from each sub-population. In this paper, a test is proposed to find if the data is homogenous or not. Our test uses the full sample of size N and randomly chosen sub-samples of size n. They are randomly chosen since sub-populations are unknown. A regression equation with common coefficients over the whole sample such as y(ik) - x'(ik)beta(0) + u(ik) is assumed under the null hypothesis. A regression equation with variable coefficients y(ik) = x'(ik) (beta(0) + n/N beta(k)) + u(ik) is assumed under the alternative hypothesis. This alternative hypothesis states that the deviation from a common regression is small when the size n of randomly chosen sub-samples is small compared with N. This reflects our intuition that it is too restrictive to fit one regression equation with common coefficients to a large sample. It may be impossible to avoid specification errors in this estimation. However, specification errors may be negligible if a regression equation is fit to a small sub-sample. For a given sub-sample of size n, the Wu-Hausman statistic W H = (b(s) - b(f))'(V(b(s)) - V(b(f)))(-1)(b(s) - b(f)) is used for the test where b(f) and b(s) are the full sample and the sub-sample least squares estimator, respectively. It is asymptotically distributed as X(2) (K) under the null hypothesis where K is the number of coefficients. The sub-sample of size n is repeatedly and randomly taken from the full sample of size N for Ns times, and the test statistic is calculated for Ns times accordingly. Since n is arbitrary, various values of n are chosen in the test starting from 5% to more than one third of the full sample. An alternative WH test statistic uses the bootstrapping estimators of the coefficients and the variance covariance matrices. The sub-sample test statistics can be correlated with each other since the sub-samples are randomly chosen from the full sample and can be overlapped. Critical values of the test statistics are calculated by simulations. An example follows.

引用

页码：914 / 919

页数：6

共 50 条

[31] Non-Homogeneity in Data Envelopment Analysis and the Reference Set Restrictions
Dlouhy, Martin
40TH INTERNATIONAL CONFERENCE MATHEMATICAL METHODS IN ECONOMICS 2022, 2022, : 52 - 57
[32] Testing Homogeneity of Stratum Effects in Stratified Paired Binary Data
Zhao, Yan D.
Rahardja, Dewi
Wang, De-Hui
Shen, Haili
JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2014, 24 (03) : 600 - 607
[33] Homogeneity and scale testing for small samples with censored and missing data
Stehlik, M.
RELIABILITY, RISK AND SAFETY: THEORY AND APPLICATIONS VOLS 1-3, 2010, : 873 - 879
[34] Exact methods of testing the homogeneity of prevalences for correlated binary data
Liu, Xiaobin
Yang, Zhengyu
Liu, Song
Ma, Chang-Xing
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017, 87 (15) : 3021 - 3039
[35] Testing homogeneity in high dimensional data through random projections
Qiu, Tao
Zhang, Qintong
Fang, Yuanyuan
Xu, Wangli
JOURNAL OF MULTIVARIATE ANALYSIS, 2024, 200
[36] Testing marginal homogeneity in clustered matched-pair data
Yang, Zhao
Sun, Xuezheng
Hardin, James W.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (03) : 1313 - 1318
[37] TESTING MARGINAL HOMOGENEITY IN SQUARE TABLES - WITH EMPHASIS ON MATCHED DATA
MOUSSA, MAA
COMPUTER PROGRAMS IN BIOMEDICINE, 1985, 19 (2-3): : 239 - 247
[38] Testing Marginal Homogeneity in Matched-Pair Polytomous Data
Zhao Yang
Xuezheng Sun
James W. Hardin
Drug information journal : DIJ / Drug Information Association, 2012, 46 : 434 - 438
[39] Testing Marginal Homogeneity in Matched-Pair Polytomous Data
Yang, Zhao
Sun, Xuezheng
Hardin, James W.
DRUG INFORMATION JOURNAL, 2012, 46 (04): : 434 - 438
[40] Testing homogeneity of several exponential populations using censored data
Gill, Amar Nath
Kumar, Jatesh
Singh, Parminder
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (01) : 344 - 356

← 1 2 3 4 5 →