Classification of SDSS photometric data using machine learning on a cloud

被引:2
|
作者
Acharya, Vishwanath [1 ]
Bora, Piyush Singh [1 ]
Navin, Karri [1 ]
Nazareth, Anisha [1 ]
Anusha, P. S. [1 ]
Rao, Shrisha [1 ]
机构
[1] Int Inst Informat Technol Bangalore, 26-C Elect City, Bengaluru 560100, India
来源
CURRENT SCIENCE | 2018年 / 115卷 / 02期
基金
美国国家科学基金会;
关键词
Astronomical data; classification; cloud computing; distributed algorithms; machine learning;
D O I
10.18520/cs/v115/i2/249-257
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Astronomical datasets are typically very large, and manually classifying the data in them is effectively impossible. We use machine learning algorithms to provide classifications (as stars, quasars and galaxies) for more than one billion objects given photometrically in the Third Data Release of the Sloan Digital Sky Survey (SDSS-III). We have used kNN, SVM and random forest algorithms in a distributed environment over the cloud to classify 1,183,850,913 unclassified photometric objects present in the SDSSIII catalog. This catalog contains photometric data for all objects viewed through a telescope and spectroscopic data for a small part of these. Although it is possible to classify all the objects using spectroscopic data, it is impractical to obtain such data for each one of them. To classify such a big dataset on a single machine would be impractically slow, so we have used the Spark cluster computing framework to implement a distributed computing environment over the cloud. We found that writing results (dozens of gigabytes) to the cloud storage is very slow while using kNN. Though writing the results with SVM is faster as it is done in parallel, its accuracy is only around 87%, due to lack of a kernel implementation of it in Spark. We then used the random forest algorithm to classify the entire set of 1,183,850,913 objects with an accuracy of 94% in about 17 hours of processing time. The result set is significant as even collecting spectroscopic data for these many objects would take decades, and our classifications can help astronomers and astrophysicists carry out further studies.
引用
收藏
页码:249 / 257
页数:9
相关论文
共 50 条
  • [1] Deep learning applications based on SDSS photometric data: detection and classification of sources
    He, Zhendong
    Qiu, Bo
    Luo, A-Li
    Shi, Jinghang
    Kong, Xiao
    Jiang, Xia
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2021, 508 (02) : 2039 - 2052
  • [2] A Secure Data Classification Model in Cloud Computing Using Machine Learning Approach
    Kaur, Kulwinder
    Zandu, Vikas
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (08): : 13 - 21
  • [3] PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING
    Lochner, Michelle
    McEwen, Jason D.
    Peiris, Hiranya V.
    Lahav, Ofer
    Winter, Max K.
    [J]. ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2016, 225 (02):
  • [4] Machine learning classification of SDSS transient survey images
    du Buisson, L.
    Sivanandam, N.
    Bassett, Bruce A.
    Smith, M.
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2015, 454 (02) : 2026 - 2038
  • [5] Automated rebar diameter classification using point cloud data based machine learning
    Kim, Min-Koo
    Thedja, Julian Pratama Putra
    Chi, Hung-Lin
    Lee, Dong-Eun
    [J]. AUTOMATION IN CONSTRUCTION, 2021, 122
  • [6] Health Status Classification for Cows Using Machine Learning and Data Management on AWS Cloud
    Dineva, Kristina
    Atanasova, Tatiana
    [J]. ANIMALS, 2023, 13 (20):
  • [7] Photometric Light Curves Classification with Machine Learning
    Gabruseva, Tatiana
    Zlobin, Sergey
    Wang, Peter
    [J]. JOURNAL OF ASTRONOMICAL INSTRUMENTATION, 2020, 9 (01)
  • [8] Seismic Data Classification using Machine Learning
    Li, Wenrui
    Nakshatra
    Narvekar, Nishita
    Raut, Nitisha
    Sirkeci, Birsen
    Gao, Jerry
    [J]. 2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2018), 2018, : 56 - 63
  • [9] Photometric redshift estimation on SDSS data using Random Forests
    Carliles, Samuel
    Budavari, Tamas
    Heinis, Sebastien
    Priebe, Carey
    Szalay, Alexander
    [J]. ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XVII, 2008, 394 : 521 - +
  • [10] Photometric classification of Hyper Suprime-Cam transients using machine learning
    Takahashi, Ichiro
    Suzuki, Nao
    Yasuda, Naoki
    Kimura, Akisato
    Ueda, Naonori
    Tanaka, Masaomi
    Tominaga, Nozomu
    Yoshida, Naoki
    [J]. PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF JAPAN, 2020, 72 (05)