Stratified Sampling Design Based on Data Mining

被引:6
|
作者
Kim, Yeonkook J. [1 ]
Oh, Yoonhwan [1 ]
Park, Sunghoon [2 ]
Cho, Sungzoon [2 ]
Park, Hayoung [1 ]
机构
[1] Seoul Natl Univ, Technol Management Econ & Policy Grad Program, 1 Gwanak Ro, Seoul 151742, South Korea
[2] Seoul Natl Univ, Dept Ind Engn, Seoul, South Korea
关键词
Sampling Studies; Decision Trees; Data Mining;
D O I
10.4258/hir.2013.19.3.186
中图分类号
R-058 [];
学科分类号
摘要
Objectives: To explore classification rules based on data mining methodologies which are to be used in defining strata in stratified sampling of healthcare providers with improved sampling efficiency. Methods: We performed k- means clustering to group providers with similar characteristics, then, constructed decision trees on cluster labels to generate stratification rules. We assessed the variance explained by the stratification proposed in this study and by conventional stratification to evaluate the performance of the sampling design. We constructed a study database from health insurance claims data and providers' profile data made available to this study by the Health Insurance Review and Assessment Service of South Korea, and population data from Statistics Korea. From our database, we used the data for single specialty clinics or hospitals in two specialties, general surgery and ophthalmology, for the year 2011 in this study. Results: Data mining resulted in five strata in general surgery with two stratification variables, the number of inpatients per specialist and population density of provider location, and five strata in ophthalmology with two stratification variables, the number of inpatients per specialist and number of beds. The percentages of variance in annual changes in the productivity of specialists explained by the stratification in general surgery and ophthalmology were 22% and 8%, respectively, whereas conventional stratification by the type of provider location and number of beds explained 2% and 0.2% of variance, respectively. Conclusions: This study demonstrated that data mining methods can be used in designing efficient stratified sampling with variables readily available to the insurer and government; it offers an alternative to the existing stratification method that is widely used in healthcare provider surveys in South Korea.
引用
收藏
页码:186 / 195
页数:10
相关论文
共 50 条
  • [1] Stratified sampling for data mining on the deep web
    Tantan Liu
    Fan Wang
    Gagan Agrawal
    Frontiers of Computer Science, 2012, 6 : 179 - 196
  • [2] Stratified sampling for data mining on the deep web
    Liu, Tantan
    Wang, Fan
    Agrawal, Gagan
    FRONTIERS OF COMPUTER SCIENCE, 2012, 6 (02) : 179 - 196
  • [3] Design of intelligent data sampling methodology based on data mining
    Lee, JH
    Yu, SJ
    Park, SC
    IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 2001, 17 (05): : 637 - 649
  • [4] Euclidean distance stratified random sampling based clustering model for big data mining
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    COMPUTATIONAL AND MATHEMATICAL METHODS, 2021, 3 (06)
  • [5] Stratified sampling for association rules mining
    Li, YR
    Gopalan, RP
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS II, 2005, 187 : 79 - 88
  • [6] A novel building sampling approach leveraging data mining and stratified sampling theory for energy optimization
    Fang, Zhijian
    Lei, Lei
    Zheng, Run
    ENERGY AND BUILDINGS, 2025, 330
  • [7] Stratified linear systematic sampling based clustering approach for detection of financial risk group by mining of big data
    Kamlesh Kumar Pandey
    Diwakar Shukla
    International Journal of System Assurance Engineering and Management, 2022, 13 : 1239 - 1253
  • [8] Stratified linear systematic sampling based clustering approach for detection of financial risk group by mining of big data
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (03) : 1239 - 1253
  • [9] Sampling Based on Genetic Algorithm for Data Mining
    Wang Jianyong
    Huang Yu
    Hu Bin
    Wei Xiaomei
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (09): : 3667 - 3672
  • [10] Design-based or prediction-based inference? Stratified random vs stratified balanced sampling
    Brewer, KRW
    INTERNATIONAL STATISTICAL REVIEW, 1999, 67 (01) : 35 - 47