A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases

被引:5
|
作者
Perez, Joaquin [1 ]
Iturbide, Emmanuel [1 ]
Olivares, Victor [1 ]
Hidalgo, Miguel [1 ]
Almanza, Nelva [1 ]
Martinez, Alicia [1 ]
机构
[1] CENIDET, Dept Comp Sci, Cuernavaca, Morelos, Mexico
关键词
Data Preparation Methodology; Mortality Databases; Epidemiology; PREPROCESSING METHOD;
D O I
10.1007/978-3-319-16486-1_116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is known that the data preparation phase is the most time consuming phase in the data mining process. Between 50% or up to 70% of the total project time and the results of data preparation directly affect the quality of it. Currently, data mining methodologies hold a general purpose; one of the limitations being that they do not provide a guide about what particular task to develop in a particular domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging, on one hand, we observed that the use of the methodology reduced some of the time-consuming tasks and, on the other hand, the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico.
引用
收藏
页码:1173 / 1182
页数:10
相关论文
共 50 条
  • [31] Data mining and knowledge discovery in databases: Implications for scientific databases
    Fayyad, U
    NINTH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 1997, : 2 - 11
  • [32] Data mining methodology for anomaly detection in network data
    Caruso, Costantina
    Malerba, Donato
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS: KES 2007 - WIRN 2007, PT II, PROCEEDINGS, 2007, 4693 : 109 - 116
  • [33] Preparation of Distributed Heterogeneous Data for Data Mining
    Batasova, Svetlana
    Efimova, Maria
    Kholod, Ivan
    Semenchenko, Alexey
    2015 XVIII International Conference on Soft Computing and Measurements (SCM), 2015, : 205 - 207
  • [34] Integration and Automation of Data Preparation and Data Mining
    Narayanan, Shrikanth
    Jaiswal, Ayush
    Chiang, Yao-Yi
    Geng, Yanhui
    Knoblock, Craig A.
    Szekely, Pedro
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 1076 - 1085
  • [35] Data mining applied to transformer oil analysis data
    Esp, DG
    Carrillo, M
    McGrail, AJ
    CONFERENCE RECORD OF THE 1998 IEEE INTERNATIONAL SYMPOSIUM ON ELECTRICAL INSULATION, VOLS 1 AND 2, 1998, : 12 - 15
  • [36] Data mining and knowledge discovery in databases - An overview
    MacKinnon, MJ
    Glick, N
    AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 1999, 41 (03) : 255 - 275
  • [37] Data Mining in and around Crystal Structure Databases
    Yvon Le Page
    MRS Bulletin, 2006, 31 : 991 - 994
  • [38] Data mining in and around crystal structure databases
    Le Page, Yvon
    MRS BULLETIN, 2006, 31 (12) : 991 - 994
  • [39] Harnessing data mining to explore incident databases
    Anand, S
    Keren, N
    Tretter, MJ
    Wang, YJ
    O'Connor, TM
    Mannan, MS
    JOURNAL OF HAZARDOUS MATERIALS, 2006, 130 (1-2) : 33 - 41
  • [40] From data mining to knowledge discovery in databases
    Fayyad, U
    PiatetskyShapiro, G
    Smyth, P
    AI MAGAZINE, 1996, 17 (03) : 37 - 54