A new data analysis method based on feature linear combination

被引:3
|
作者
Lin, Xiaohui [1 ]
Zhang, Yanhui [1 ]
Li, Chao [1 ]
Wang, Jue [1 ]
Luo, Ping [2 ]
Zhou, Huiwei [1 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China
[2] Chinese Acad Sci, Dalian Inst Chem Phys, CAS Key Lab Separat Sci Analyt Chem, Dalian 116023, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature relationship; Classification; Metabolomics; SELECTION METHOD; GENE; ALGORITHM; MS;
D O I
10.1016/j.jbi.2019.103173
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In biological data, feature relationships are complex and diverse, they could reflect physiological and pathological changes. Defining simple and efficient classification rules based on feature relationships is helpful for discriminating different conditions and studying disease mechanism. The popular data analysis method, k top scoring pairs (k-TSP), explores the feature relationship by focusing on the difference of the relative level of two features in different groups and classifies samples based on the exploration. To define more efficient classification rules, we propose a new data analysis method based on the linear combination of k > 0 top scoring pairs (LC-k-TSP). LC-k-TSP applies support vector machine (SVM) to define the best linear relationship of each feature pair, scores feature pairs by the discriminative abilities of the corresponding linear combinations and selects k disjoint top scoring pairs to construct an ensemble classifier. Experiments on twelve public datasets showed the superiority of LC-k-TSP over k-TSP which evaluates the relationship of every two features in the same way. The experiment also illustrated that LC-k-TSP performed similarly to SVM and random forest (RF) in accuracy rate. LC-k-TSP studies the own unique linear combination for each feature pair and defines simple classification rules, it is easy to explore the biomedical explanation. Finally, we applied LC-k-TSP to analyze the hepatocellular carcinoma (HCC) metabolomics data and define the simple classification rules for discrimination of different liver diseases. It obtained accuracy rates of 89.76% and 89.13% in distinguishing between small HCC and hepatic cirrhosis (CIR) groups as well as between HCC and CIR groups, superior to 87.99% and 80.35% by k-TSP. Hence, defining classification rules based on feature relationships is an effective way to analyze biological data. LC-k-TSP which checks different feature pairs by their corresponding unique best linear relationship has the superiority over k-TSP which checks each pair by the same linear relationship.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] A simple feature combination method based on dominant sets
    Hou, Jian
    Pelillo, Marcello
    [J]. PATTERN RECOGNITION, 2013, 46 (11) : 3129 - 3139
  • [22] A new feature extraction method based on feature integration
    Liu Yi
    Zhang Caiming
    [J]. ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 3, PROCEEDINGS, 2006, : 170 - +
  • [23] Filter unsupervised spectral feature selection method for mixed data based on a new feature correlation measure
    Solorio-Fernandez, Saul
    Carrasco-Ochoa, J. Ariel
    Martinez-Trinidad, Jose Fco.
    [J]. NEUROCOMPUTING, 2024, 571
  • [24] New Feature Selection Method for Transformer Fault Diagnosis Based on DGA Data
    Zhang Y.
    Feng J.
    Li D.
    Wang S.
    [J]. Dianwang Jishu/Power System Technology, 2021, 45 (08): : 3324 - 3331
  • [25] Audio Fingerprint Retrieval Method Based on Feature Dimension Reduction and Feature Combination
    Zhang, Qiu-yu
    Xu, Fu-jiu
    Bai, Jian
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (02): : 522 - 539
  • [26] Handwritten Digit Recognition Based on a New Combination Feature
    Zhang, Xinbo
    Wu, Lili
    [J]. ASIA-PACIFIC YOUTH CONFERENCE ON COMMUNICATION TECHNOLOGY 2010 (APYCCT 2010), 2010, : 826 - 829
  • [27] A NEW METHOD OF LINE FEATURE GENERALIZATION BASED ON SHAPE CHARACTERISTIC ANALYSIS
    Nie, Hongshan
    Huang, Zhijian
    [J]. METROLOGY AND MEASUREMENT SYSTEMS, 2011, 18 (04) : 597 - 605
  • [28] A Semisupervised Feature Extraction Method Based on Fuzzy-type Linear Discriminant Analysis
    Chu, Hui-Shan
    Li, Cheng-Hsuan
    Kuo, Bor-Chen
    Lin, Chin-Teng
    [J]. IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1927 - 1932
  • [29] A Feature Extraction Method Based on Differential Entropy and Linear Discriminant Analysis for Emotion Recognition
    Chen, Dong-Wei
    Miao, Rui
    Yang, Wei-Qi
    Liang, Yong
    Chen, Hao-Heng
    Huang, Lan
    Deng, Chun-Jian
    Han, Na
    [J]. SENSORS, 2019, 19 (07)
  • [30] A subpixel location method for target based on linear feature
    Zhao, Lingli
    Zhua, Jianjun
    Liu, Shuai
    Li, Junsheng
    [J]. 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 3, 2008, : 729 - 733