A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases

被引:5
|
作者
Perez, Joaquin [1 ]
Iturbide, Emmanuel [1 ]
Olivares, Victor [1 ]
Hidalgo, Miguel [1 ]
Almanza, Nelva [1 ]
Martinez, Alicia [1 ]
机构
[1] CENIDET, Dept Comp Sci, Cuernavaca, Morelos, Mexico
关键词
Data Preparation Methodology; Mortality Databases; Epidemiology; PREPROCESSING METHOD;
D O I
10.1007/978-3-319-16486-1_116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is known that the data preparation phase is the most time consuming phase in the data mining process. Between 50% or up to 70% of the total project time and the results of data preparation directly affect the quality of it. Currently, data mining methodologies hold a general purpose; one of the limitations being that they do not provide a guide about what particular task to develop in a particular domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging, on one hand, we observed that the use of the methodology reduced some of the time-consuming tasks and, on the other hand, the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico.
引用
收藏
页码:1173 / 1182
页数:10
相关论文
共 50 条
  • [41] The study on data mining method for distributed databases
    Dan, S. (sudan1108@163.com), 1600, Advanced Institute of Convergence Information Technology, Myoungbo Bldg 3F,, Bumin-dong 1-ga, Seo-gu, Busan, 602-816, Korea, Republic of (04):
  • [42] Data mining and knowledge discovery in databases: An overview
    Zhu, W
    RESEARCH QUARTERLY FOR EXERCISE AND SPORT, 2002, 73 (01) : A37 - A37
  • [43] BioInformatics: Databases plus data mining (abstract)
    Siebes, A
    SOFSEM 2000: THEORY AND PRACTICE OF INFORMATICS, 2000, 1963 : 54 - 55
  • [44] Knowledge discovery and data mining in biological databases
    Brusic, V
    Zeleznikow, J
    KNOWLEDGE ENGINEERING REVIEW, 1999, 14 (03): : 257 - 277
  • [45] Event sequence data mining in temporal databases
    Anderson, I
    Miyamoto, K
    Yanaru, T
    ICONIP'98: THE FIFTH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING JOINTLY WITH JNNS'98: THE 1998 ANNUAL CONFERENCE OF THE JAPANESE NEURAL NETWORK SOCIETY - PROCEEDINGS, VOLS 1-3, 1998, : 1659 - 1661
  • [46] Data mining in distributed databases for interacting galaxies
    Borne, K
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XIV, PROCEEDINGS, 2005, 347 : 350 - 354
  • [47] Data mining method from text databases
    Kawano, M
    Watada, J
    Kawaura, T
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2005, 3683 : 1122 - 1128
  • [48] Extension of multiagent data mining for distributed databases
    Niimi, A
    Konishi, O
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2004, 3215 : 780 - 787
  • [49] Inductive databases and condensed representations for data mining
    Mannila, H
    LOGIC PROGRAMMING - PROCEEDINGS OF THE 1997 INTERNATIONAL SYMPOSIUM, 1997, : 21 - 30
  • [50] Data mining: Machine learning, statistics, and databases
    Mannila, H
    EIGHTH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE SYSTEMS, PROCEEDINGS, 1996, : 2 - 9