Entropy Based Clustering of Viral Sequences

被引:1
|
作者
Juyal, Akshay [1 ]
Hosseini, Roya [1 ]
Novikov, Daniel [1 ]
Grinshpon, Mark [2 ]
Zelikovsky, Alex [1 ]
机构
[1] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30303 USA
[2] Georgia State Univ, Dept Math & Stat, Atlanta, GA 30303 USA
关键词
Categorical data; Clustering; Entropy; Monte Carlo algorithm; Viral genomic sequences; TRANSMISSIONS; VARIANTS; DESIGN; AIDS;
D O I
10.1007/978-3-031-23198-8_33
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering viral sequences allows us to characterize the composition and structure of intrahost and interhost viral populations, which play a crucial role in disease progression and epidemic spread. In this paper we propose and validate a new entropy based method for clustering aligned viral sequences considered as categorical data. The method finds a homogeneous clustering by minimizing information entropy rather than distance between sequences in the same cluster. We have applied our entropy based clustering method to SARS-CoV-2 viral sequencing data. We report the information content extracted from the sequences by entropy based clustering. Our method converges to similar minimum-entropy clusterings across different runs and limited permutations of data. We also show that a parallelized version of our tool is scalable to very large SARS-CoV-2 datasets.
引用
收藏
页码:369 / 380
页数:12
相关论文
共 50 条
  • [1] Clustering based on possibilistic entropy
    Wang, L
    Ji, HB
    Gao, XB
    2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 2004, : 1467 - 1470
  • [2] Entropy based probabilistic collaborative clustering
    Sublime, Jeremie
    Matei, Basarab
    Cabanes, Guenael
    Grozavu, Nistor
    Bennani, Younes
    Cornuejols, Antoine
    PATTERN RECOGNITION, 2017, 72 : 144 - 157
  • [3] A Rough Clustering Algorithm Based on Entropy Information
    Soliman, Omar S.
    Hassanien, Aboul Ella
    El-Bendary, Nashwa
    SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS, 6TH INTERNATIONAL CONFERENCE SOCO 2011, 2011, 87 : 213 - 222
  • [4] Clustering validity function based on fuzzy entropy
    Fan, Jiulun
    Wu, Chengmao
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2001, 14 (04):
  • [5] A clustering algorithm based on maximum entropy principle
    Zhao, Yang
    Liu, Fangai
    2ND ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI2017), 2017, 887
  • [6] Transfer Learning Based Maximum Entropy Clustering
    Sun, Shouwei
    Jiang, Yizhang
    Qian, Pengjiang
    2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 829 - 832
  • [7] A Clustering Method Based on the Maximum Entropy Principle
    Aldana-Bobadilla, Edwin
    Kuri-Morales, Angel
    ENTROPY, 2015, 17 (01) : 151 - 180
  • [8] An Entropy based Method for Overlapping Subspace Clustering
    Puri, Charu
    Kumar, Naveen
    5TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2017, 2017, 122 : 276 - 283
  • [9] Entropy-Based Metrics in Swarm Clustering
    Liu, Bo
    Pan, Jiuhui
    McKay, R. I.
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2009, 24 (09) : 989 - 1011
  • [10] An entropy clustering analysis based on genetic algorithm
    Wei, Liang-Ying
    Cheng, Ching-Hsue
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2008, 19 (4-5) : 235 - 241