A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases

被引:5
|
作者
Perez, Joaquin [1 ]
Iturbide, Emmanuel [1 ]
Olivares, Victor [1 ]
Hidalgo, Miguel [1 ]
Almanza, Nelva [1 ]
Martinez, Alicia [1 ]
机构
[1] CENIDET, Dept Comp Sci, Cuernavaca, Morelos, Mexico
关键词
Data Preparation Methodology; Mortality Databases; Epidemiology; PREPROCESSING METHOD;
D O I
10.1007/978-3-319-16486-1_116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is known that the data preparation phase is the most time consuming phase in the data mining process. Between 50% or up to 70% of the total project time and the results of data preparation directly affect the quality of it. Currently, data mining methodologies hold a general purpose; one of the limitations being that they do not provide a guide about what particular task to develop in a particular domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging, on one hand, we observed that the use of the methodology reduced some of the time-consuming tasks and, on the other hand, the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico.
引用
收藏
页码:1173 / 1182
页数:10
相关论文
共 50 条
  • [21] Data mining in software metrics databases
    Dick, S
    Meeks, A
    Last, M
    Bunke, H
    Kandel, A
    FUZZY SETS AND SYSTEMS, 2004, 145 (01) : 81 - 110
  • [22] Using databases and data mining in vaccinology
    Davies, Matthew N.
    Guan, Pingping
    Blythe, Martin J.
    Salomon, Jesper
    Toseland, Christopher P.
    Hattotuwagama, Channa
    Walshe, Valerie
    Doytchinova, Irini A.
    Flower, Darren R.
    EXPERT OPINION ON DRUG DISCOVERY, 2007, 2 (01) : 19 - 35
  • [23] Data mining of GMTI radar databases
    Corbeil, Allan
    Van Patten, Greg
    Spoldi, Laura
    O'Hern, Brian
    Alford, Mark
    2006 IEEE RADAR CONFERENCE, VOLS 1 AND 2, 2006, : 154 - +
  • [24] Data mining in forensic image Databases
    Geradts, Z
    Bijhold, J
    INVESTIGATIVE IMAGE PROCESSING II, 2002, 4709 : 92 - 101
  • [25] RADAR DATA PREPARATION FOR DATA MINING
    Keller, David
    Ondryhal, Vojtech
    ICMT '07: INTERNATIONAL CONFERENCE ON MILITARY TECHNOLOGIES, 2007, : 622 - 628
  • [26] Data Mining Applied on Grain Data Mart
    Correa, F. E.
    Oliveira, M. D. B.
    Alves, L. R. A.
    Gama, J.
    Correa, P. L. P.
    EFITA/WCCA '11, 2011, : 518 - 527
  • [27] Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits
    Andres-Perez, Esther
    ENERGIES, 2020, 13 (21)
  • [28] A new data clustering approach for data mining in large databases
    Tsai, CF
    Wu, HC
    Tsai, CW
    I-SPAN'02: INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND NETWORKS, PROCEEDINGS, 2002, : 315 - 320
  • [29] Applied optimization and data mining
    Chaovalitwongse, W. Art
    Chou, Chun-An
    Liang, Zhe
    Wang, Shouyi
    ANNALS OF OPERATIONS RESEARCH, 2017, 249 (1-2) : 1 - 3
  • [30] Applied optimization and data mining
    W. Art Chaovalitwongse
    Chun-An Chou
    Zhe Liang
    Shouyi Wang
    Annals of Operations Research, 2017, 249 : 1 - 3