A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases

被引:5
|
作者
Perez, Joaquin [1 ]
Iturbide, Emmanuel [1 ]
Olivares, Victor [1 ]
Hidalgo, Miguel [1 ]
Almanza, Nelva [1 ]
Martinez, Alicia [1 ]
机构
[1] CENIDET, Dept Comp Sci, Cuernavaca, Morelos, Mexico
关键词
Data Preparation Methodology; Mortality Databases; Epidemiology; PREPROCESSING METHOD;
D O I
10.1007/978-3-319-16486-1_116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is known that the data preparation phase is the most time consuming phase in the data mining process. Between 50% or up to 70% of the total project time and the results of data preparation directly affect the quality of it. Currently, data mining methodologies hold a general purpose; one of the limitations being that they do not provide a guide about what particular task to develop in a particular domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging, on one hand, we observed that the use of the methodology reduced some of the time-consuming tasks and, on the other hand, the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico.
引用
收藏
页码:1173 / 1182
页数:10
相关论文
共 50 条
  • [1] A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases
    Joaquín Pérez
    Emmanuel Iturbide
    Víctor Olivares
    Miguel Hidalgo
    Alicia Martínez
    Nelva Almanza
    Journal of Medical Systems, 2015, 39
  • [2] A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases
    Perez, Joaquin
    Iturbide, Emmanuel
    Olivares, Victor
    Hidalgo, Miguel
    Martinez, Alicia
    Almanza, Nelva
    JOURNAL OF MEDICAL SYSTEMS, 2015, 39 (11)
  • [3] Data Mining Methodology in Perspective of Manufacturing Databases
    Shahbaz, Muhammad
    Shaheen, Muhammad
    Aslam, Muhammad
    Ahsan, Syed
    Farooq, Amjad
    Arshad, Junaid
    Masood, Syed Athar
    LIFE SCIENCE JOURNAL-ACTA ZHENGZHOU UNIVERSITY OVERSEAS EDITION, 2012, 9 (03): : 13 - 22
  • [4] A search space reduction methodology for data mining in large databases
    Kuri-Morales, Angel
    Rodriguez-Erazo, Fatima
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2009, 22 (01) : 57 - 65
  • [5] Data mining in astronomical databases
    Borne, KD
    MINING THE SKY, 2001, : 671 - 673
  • [6] Data mining in inductive databases
    Siebes, Arno
    KNOWLEDGE DISCOVERY IN INDUCTIVE DATABASES, 2006, 3933 : 1 - 23
  • [7] Hypertext databases and data mining
    Chakrabarti, S
    SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 508 - 508
  • [8] Mining databases and data streams
    Zaniolo, Carlo
    Thakkar, Hetal
    HOMELAND SECURITY TECHNOLOGY CHALLENGES: FROM SENSING AND ENCRYPTING TO MINING AND MODELING, 2008, : 103 - +
  • [9] Data preparation for data mining
    Zhang, SC
    Zhang, CQ
    Yang, Q
    APPLIED ARTIFICIAL INTELLIGENCE, 2003, 17 (5-6) : 375 - 381
  • [10] Integrating data mining with SQL databases: OLE DB for data mining
    Netz, A
    Chaudhuri, S
    Fayyad, U
    Bernhardt, J
    17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 379 - 387