Selection of breast features for young women in northwestern China based on the random forest algorithm

被引:12
|
作者
Zhou, Jie [1 ]
Mao, Qian [1 ]
Zhang, Jun [2 ]
Lau, Newman M. L. [2 ]
Chen, Jianming [3 ]
机构
[1] Xian Polytech Univ, Sch Apparel & Art Design, 19 Jinhua South Rd, Xian 710048, Shaanxi, Peoples R China
[2] Hong Kong Polytech Univ, Sch Design, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Dept Biomed Engn, Hong Kong, Peoples R China
关键词
Breast shape classification; random forest algorithm; feature selection; breast shape recognition; K-MEANS; BRA; SHAPE; DIMENSIONS; SUPPORT;
D O I
10.1177/00405175211040869
中图分类号
TB3 [工程材料学]; TS1 [纺织工业、染整工业];
学科分类号
0805 ; 080502 ; 0821 ;
摘要
In the research of breast morphology, numerous breast features are measured, whereas only a few parameters are adopted for classification. Therefore, how to extract the key variables from the multi-dimensional features in a rational way is an issue that is focused upon. This study aimed to reduce the complexity of the dimensionality reduction for further improving the objectivity and interpretability of the selected breast features. Since the random forest (RF) algorithm can quantify the feature importance during training, the method was adopted to determine the optimal breast features for classification and recognition in this paper. Firstly, the anthropometric data of 360 females from northwestern China aged from 19 to 27 years were measured by non-contact three-dimensional body scanning technology and the contact manual measurement method. Then, the k-means clustering was applied to categorize breast shapes, and the RF algorithm was utilized to quantify and rank the importance of 25 breast features. Finally, to verify the availability of the RF algorithm on breast feature selection, the t-distributed stochastic neighbor embedding method was adopted to visualize the distribution of breast shape clusters into two dimensions. Meanwhile, four neural networks were determined to recognize the breast morphology. The results demonstrate that fewer breast features can effectively increase the accuracy of breast shape classification and recognition. The best performance of breast shape classification and recognition is obtained when the number of breast features is 13. In this case, the average Hamming loss of four neural networks is the smallest (0.1136). Interestingly, the bust circumference and the horizontal curve of breasts across the bust points are found to be the most important of the 25 breast features in this paper. The importance of the breast curve features is higher than that of the breast cross-sectional features, while the breast positioning features have the lowest importance. Meanwhile, the RF algorithm is verified to be more effective than traditional dimensionality reduction methods, such as principal component analysis, hierarchical clustering, and recursive feature elimination. The approach developed in this paper can be generalized to the dimensionality reduction of other body morphology.
引用
收藏
页码:957 / 973
页数:17
相关论文
共 50 条
  • [31] Classification of Multiple Power Quality Disturbances Based on TQWT and Random Forest Feature Selection Algorithm
    Yang X.
    Guo L.
    Xiao X.
    Zhang J.
    Dianwang Jishu/Power System Technology, 2020, 44 (08): : 3014 - 3020
  • [32] Research on the Application of Random Forest-based Feature Selection Algorithm in Data Mining Experiments
    Wang, Huan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 505 - 518
  • [33] Speaker-independent Speech Emotion Recognition Based on Random Forest Feature Selection Algorithm
    Cao, Wei-Hua
    Xu, Jian-Ping
    Liu, Zhen-Tao
    PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 10995 - 10998
  • [34] The site-specific selection of the infiltration model based on the global dataset and random forest algorithm
    Kim, Seongyun
    Karahan, Gulay
    Sharma, Manan
    Pachepsky, Yakov
    VADOSE ZONE JOURNAL, 2021, 20 (03)
  • [35] Morphological and immunohistochemical features of breast cancer in young women
    Marian, A. -A.
    Suciu, C.
    Muresan, A.
    Anderco, D.
    Herman, D.
    Lazureanu, C.
    Derban, M.
    Taban, S.
    Dema, A.
    VIRCHOWS ARCHIV, 2015, 467 : S58 - S58
  • [36] BREAST CANCER IN YOUNG WOMEN: PATHOLOGIC AND IMMUNOHISTOCHEMICAL FEATURES
    Eric, Ivan
    Eric, Anamarija Petek
    Kristek, Jozo
    Koprivcic, Ivan
    Babic, Marko
    ACTA CLINICA CROATICA, 2018, 57 (03) : 497 - 502
  • [37] Genetic and histologic features of breast cancer in young women
    Van de Vijver, MJ
    EUROPEAN JOURNAL OF CANCER, 2002, 38 : S126 - S126
  • [38] Similarity based on the importance of common features in random forest
    Chen X.
    Han L.
    Leng M.
    Pan X.
    International Journal of Performability Engineering, 2019, 15 (04) : 1171 - 1180
  • [39] An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features
    Zhang, Ying
    Song, Bin
    Zhang, Yue
    Chen, Sijia
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2017, 2017, 10393 : 642 - 651
  • [40] Software Defect Prediction using Feature Selection and Random Forest Algorithm
    Ibrahim, Dyana Rashid
    Ghnemat, Rawan
    Hudaib, Amjad
    2017 INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2017, : 252 - 257