Biological Data Mining for Genomic Clustering Using Unsupervised Neural Learning

被引:0
|
作者
Sen, Shreyas [1 ]
Narasimhan, Seetharam [1 ]
Konar, Amit [1 ]
机构
[1] Jadavpur Univ, Elect & Telecommun Engn Dept, Kolkata 700032, W Bengal, India
关键词
DNA-descriptors; Feature Descriptors; Principal Component Analysis (PCA); Self-Organizing Feature Map (SOFM);
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The paper aims at designing a scheme for automatic identification of a species from its genome sequence. A set of 64 three-tuple keywords is first generated using the four types of bases: A, T, C and G. These keywords are searched on N randomly sampled genome sequences, each of a given length (10,000 elements) and the frequency count for each of the 4(3) = 64 keywords is performed to obtain a DNA-descriptor for each sample. Principal Component analysis is then employed on the DNA-descriptors for N sampled instances. The principal component analysis yields a unique feature descriptor for identifying the species from its genome sequence. The variance of the descriptors for a given genome sequence being negligible, the proposed scheme finds extensive applications in automatic species identification. An alternative approach to automatic species classification and identification of species using Self-Organizing Feature Map is also discussed in the paper. The computational map is trained by using the DNA-descriptors from different species as the training inputs. The maps for different dimensions are constructed and analyzed for optimum performance. The scheme presents a novel method for identifying a species from its genome sequence with the help of a two dimensional map of neuronal clusters, where each cluster represents a particular species. The map is shown to provide an easier technique for recognition and classification of a species based on its genomic data.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Clustering fMRI data with a robust unsupervised learning algorithm for neuroscience data mining
    Aljobouri, Hadeel K.
    Jaber, Hussain A.
    Kocak, Orhan M.
    Algin, Oktay
    Cankaya, Ilyas
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2018, 299 : 45 - 54
  • [2] Unsupervised pattern clustering for data mining
    Wilamowska, K
    Manic, M
    [J]. IECON'01: 27TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-3, 2001, : 1862 - 1867
  • [3] Data mining with unsupervised clustering using photonic micro-ring resonators
    McAulay, Alastair D.
    [J]. OPTICS AND PHOTONICS FOR INFORMATION PROCESSING VII, 2013, 8855
  • [4] Ensemble Clustering for Unsupervised Learning of Time Series Data using FPGAs
    Porcello, John C.
    [J]. 2020 IEEE AEROSPACE CONFERENCE (AEROCONF 2020), 2020,
  • [5] On clustering biological data using unsupervised and semi-supervised message passing
    Geng, HM
    Deng, XT
    Bastola, M
    Ali, H
    [J]. BIBE 2005: 5TH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, 2005, : 294 - 298
  • [6] Deer hunting optimization technique for clustering unsupervised data in data mining
    Azeez, Hayder Hussein
    [J]. INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING, 2023, 14 (01)
  • [7] Combining supervised and unsupervised learning for data clustering
    Paolo Corsini
    Beatrice Lazzerini
    Francesco Marcelloni
    [J]. Neural Computing & Applications, 2006, 15 : 289 - 297
  • [8] Combining supervised and unsupervised learning for data clustering
    Corsini, Paolo
    Lazzerini, Beatrice
    Marcelloni, Francesco
    [J]. NEURAL COMPUTING & APPLICATIONS, 2006, 15 (3-4): : 289 - 297
  • [9] On unsupervised simultaneous kernel learning and data clustering
    Malhotra, Akshay
    Schizas, Ioannis D.
    [J]. PATTERN RECOGNITION, 2020, 108
  • [10] Unsupervised learning of image recognition with neural society for clustering
    Wojnarski, Marcin
    [J]. ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2006, 4259 : 862 - 871