A new data analysis method based on feature linear combination

被引:3
|
作者
Lin, Xiaohui [1 ]
Zhang, Yanhui [1 ]
Li, Chao [1 ]
Wang, Jue [1 ]
Luo, Ping [2 ]
Zhou, Huiwei [1 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China
[2] Chinese Acad Sci, Dalian Inst Chem Phys, CAS Key Lab Separat Sci Analyt Chem, Dalian 116023, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature relationship; Classification; Metabolomics; SELECTION METHOD; GENE; ALGORITHM; MS;
D O I
10.1016/j.jbi.2019.103173
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In biological data, feature relationships are complex and diverse, they could reflect physiological and pathological changes. Defining simple and efficient classification rules based on feature relationships is helpful for discriminating different conditions and studying disease mechanism. The popular data analysis method, k top scoring pairs (k-TSP), explores the feature relationship by focusing on the difference of the relative level of two features in different groups and classifies samples based on the exploration. To define more efficient classification rules, we propose a new data analysis method based on the linear combination of k > 0 top scoring pairs (LC-k-TSP). LC-k-TSP applies support vector machine (SVM) to define the best linear relationship of each feature pair, scores feature pairs by the discriminative abilities of the corresponding linear combinations and selects k disjoint top scoring pairs to construct an ensemble classifier. Experiments on twelve public datasets showed the superiority of LC-k-TSP over k-TSP which evaluates the relationship of every two features in the same way. The experiment also illustrated that LC-k-TSP performed similarly to SVM and random forest (RF) in accuracy rate. LC-k-TSP studies the own unique linear combination for each feature pair and defines simple classification rules, it is easy to explore the biomedical explanation. Finally, we applied LC-k-TSP to analyze the hepatocellular carcinoma (HCC) metabolomics data and define the simple classification rules for discrimination of different liver diseases. It obtained accuracy rates of 89.76% and 89.13% in distinguishing between small HCC and hepatic cirrhosis (CIR) groups as well as between HCC and CIR groups, superior to 87.99% and 80.35% by k-TSP. Hence, defining classification rules based on feature relationships is an effective way to analyze biological data. LC-k-TSP which checks different feature pairs by their corresponding unique best linear relationship has the superiority over k-TSP which checks each pair by the same linear relationship.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] A new facial feature extraction method based on linear combination model
    Hu, YL
    Yin, BC
    Kong, DH
    [J]. IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 520 - 523
  • [2] Facial feature extraction method based on linear combination model
    Hu, Yong-Li
    Yin, Bao-Cai
    [J]. Beijing Gongye Daxue Xuebao / Journal of Beijing University of Technology, 2005, 31 (05): : 537 - 542
  • [3] An omics data analysis method based on feature linear relationship and graph convolutional network
    Zhang, Yanhui
    Lin, Xiaohui
    Gao, Zhenbo
    Wang, Tianxiang
    Dong, Kunjie
    Zhang, Jianjun
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 145
  • [4] Linear feature selection in texture analysis - A PLS based method
    Marques, Joselene
    Igel, Christian
    Lillholm, Martin
    Dam, Erik B.
    [J]. MACHINE VISION AND APPLICATIONS, 2013, 24 (07) : 1435 - 1444
  • [5] Linear feature selection in texture analysis - A PLS based method
    Joselene Marques
    Christian Igel
    Martin Lillholm
    Erik B. Dam
    [J]. Machine Vision and Applications, 2013, 24 : 1435 - 1444
  • [6] Linear feature analysis for aeromagnetic data
    Hansen, R. O.
    deRidder, Eduard
    [J]. GEOPHYSICS, 2006, 71 (06) : L61 - L67
  • [7] A new feature selection method for OCT retinal data analysis
    Banerjee, Madhushri
    Chakravarty, Sumit
    Da Huiling
    [J]. MOBILE MULTIMEDIA/IMAGE PROCESSING, SECURITY, AND APPLICATIONS 2013, 2013, 8755
  • [8] DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis
    Zhang, Chuheng
    Li, Yuanqi
    Chen, Xi
    Jin, Yifei
    Tang, Pingzhong
    Li, Jian
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 781 - 790
  • [9] A simple feature generation method based on Fisher linear discriminant analysis
    Fukumi, M
    Mitsukura, Y
    [J]. SEVENTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2005, : 342 - 346
  • [10] A New Feature Extraction Algorithm Based on Fisher Linear Discriminant Analysis
    Wang, Yunzhu
    Chen, Yunli
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2017, : 414 - 417