Seminal quality prediction using data mining methods

被引:26
|
作者
Sahoo, Anoop J. [1 ]
Kumar, Yugal [2 ]
机构
[1] Infosys Technol, Madras, Tamil Nadu, India
[2] Birla Inst Technol, Dept Informat Technol, Ranchi, Jharkhand, India
关键词
Particle swarm optimization; multilayer perceptron; seminal; support vector machine; SEMEN QUALITY; FERTILITY; MEN; POPULATION; TRENDS; SVM;
D O I
10.3233/THC-140816
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
BACKGROUND: Now-a-days, some new classes of diseases have come into existences which are known as lifestyle diseases. The main reasons behind these diseases are changes in the lifestyle of people such as alcohol drinking, smoking, food habits etc. After going through the various lifestyle diseases, it has been found that the fertility rates (sperm quantity) in men has considerably been decreasing in last two decades. Lifestyle factors as well as environmental factors are mainly responsible for the change in the semen quality. OBJECTIVE: The objective of this paper is to identify the lifestyle and environmental features that affects the seminal quality and also fertility rate in man using data mining methods. METHOD: The five artificial intelligence techniques such as Multilayer perceptron (MLP), Decision Tree (DT), Navie Bayes (Kernel), Support vector machine + Particle swarm optimization (SVM + PSO) and Support vector machine (SVM) have been applied on fertility dataset to evaluate the seminal quality and also to predict the person is either normal or having altered fertility rate. While the eight feature selection techniques such as support vector machine (SVM), neural network (NN), evolutionary logistic regression (LR), support vector machine plus particle swarm optimization (SVM + PSO), principle component analysis (PCA), chi-square test, correlation and T-test methods have been used to identify more relevant features which affect the seminal quality. These techniques are applied on fertility dataset which contains 100 instances with nine attribute with two classes. RESULTS: The experimental result shows that SVM + PSO provides higher accuracy and area under curve (AUC) rate (94% & 0.932) among multi-layer perceptron (MLP) (92% & 0.728), Support Vector Machines (91% & 0.758), Navie Bayes (Kernel) (89% & 0.850) and Decision Tree (89% & 0.735) for some of the seminal parameters. This paper also focuses on the feature selection process i.e. how to select the features which are more important for prediction of fertility rate. In this paper, eight feature selection methods are applied on fertility dataset to find out a set of good features. The investigational results shows that childish diseases (0.079) and high fever features (0.057) has less impact on fertility rate while age (0.8685), season (0.843), surgical intervention (0.7683), alcohol consumption (0.5992), smoking habit (0.575), number of hours spent on setting (0.4366) and accident (0.5973) features have more impact. It is also observed that feature selection methods increase the accuracy of above mentioned techniques (multilayer perceptron 92%, support vector machine 91%, SVM + PSO 94%, Navie Bayes (Kernel) 89% and decision tree 89%) as compared to without feature selection methods (multilayer perceptron 86%, support vector machine 86%, SVM + PSO 85%, Navie Bayes (Kernel) 83% and decision tree 84%) which shows the applicability of feature selection methods in prediction. CONCLUSION: This paper lightens the application of artificial techniques in medical domain. From this paper, it can be concluded that data mining methods can be used to predict a person with or without disease based on environmental and lifestyle parameters/features rather than undergoing various medical test. In this paper, five data mining techniques are used to predict the fertility rate and among which SVM + PSO provide more accurate results than support vector machine and decision tree.
引用
收藏
页码:531 / 545
页数:15
相关论文
共 50 条
  • [1] Data quality analysis using data-mining methods
    Windheuser, U
    [J]. OPERATIONS RESEARCH PROCEEDINGS 1999, 2000, : 304 - 310
  • [2] Software quality prediction using data mining techniques
    Merzah, Bayadaa M.
    [J]. 2019 International Conference on Information and Communications Technology, ICOIACT 2019, 2019, : 394 - 397
  • [3] Prediction of wastewater quality indicators at the inflow to the wastewater treatment plant using data mining methods
    Szelag, Bartosz
    Barbusinski, Krzysztof
    Studzinski, Jan
    Bartkiewicz, Lidia
    [J]. INTERNATIONAL CONFERENCE ON ADVANCES IN ENERGY SYSTEMS AND ENVIRONMENTAL ENGINEERING (ASEE17), 2017, 22
  • [4] Prediction of disease based on prescription using data mining methods
    Dehkordi, Shiva Kazempour
    Sajedi, Hedieh
    [J]. HEALTH AND TECHNOLOGY, 2019, 9 (01) : 37 - 44
  • [5] Meteorological Phenomena Forecast Using Data Mining Prediction Methods
    Babic, Frantisek
    Bednar, Peter
    Albert, Frantisek
    Paralic, Jan
    Bartok, Juraj
    Hluchy, Ladislav
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT I, 2011, 6922 : 458 - 467
  • [6] Prediction of Students' Academic Success Using Data Mining Methods
    Uzel, Vahide Nida
    Turgut, Sultan Sevgi
    Ozel, Selma Ayse
    [J]. 2018 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2018, : 166 - 170
  • [7] Construction Crew Productivity Prediction By Using Data Mining Methods
    Kaya, Mumine
    Keles, Abdullah Emre
    Oral, Emel Laptali
    [J]. 4TH WORLD CONFERENCE ON LEARNING TEACHING AND EDUCATIONAL LEADERSHIP (WCLTA-2013), 2014, 141 : 1249 - 1253
  • [8] Prediction of disease based on prescription using data mining methods
    Shiva Kazempour Dehkordi
    Hedieh Sajedi
    [J]. Health and Technology, 2019, 9 : 37 - 44
  • [9] Quality prediction in molded door skin using data mining
    Troncoso-Espinosa, Fredy
    Castro-Albornoz, Karen
    [J]. TECNOLOGIA EN MARCHA, 2022, 35 (01): : 115 - 127
  • [10] Prediction of Air Quality Using Time Series Data Mining
    Yadav, Mansi
    Jain, Suruchi
    Seeja, K. R.
    [J]. INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 13 - 20