Classification of SDSS photometric data using machine learning on a cloud

被引:2
|
作者
Acharya, Vishwanath [1 ]
Bora, Piyush Singh [1 ]
Navin, Karri [1 ]
Nazareth, Anisha [1 ]
Anusha, P. S. [1 ]
Rao, Shrisha [1 ]
机构
[1] Int Inst Informat Technol Bangalore, 26-C Elect City, Bengaluru 560100, India
来源
CURRENT SCIENCE | 2018年 / 115卷 / 02期
基金
美国国家科学基金会;
关键词
Astronomical data; classification; cloud computing; distributed algorithms; machine learning;
D O I
10.18520/cs/v115/i2/249-257
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Astronomical datasets are typically very large, and manually classifying the data in them is effectively impossible. We use machine learning algorithms to provide classifications (as stars, quasars and galaxies) for more than one billion objects given photometrically in the Third Data Release of the Sloan Digital Sky Survey (SDSS-III). We have used kNN, SVM and random forest algorithms in a distributed environment over the cloud to classify 1,183,850,913 unclassified photometric objects present in the SDSSIII catalog. This catalog contains photometric data for all objects viewed through a telescope and spectroscopic data for a small part of these. Although it is possible to classify all the objects using spectroscopic data, it is impractical to obtain such data for each one of them. To classify such a big dataset on a single machine would be impractically slow, so we have used the Spark cluster computing framework to implement a distributed computing environment over the cloud. We found that writing results (dozens of gigabytes) to the cloud storage is very slow while using kNN. Though writing the results with SVM is faster as it is done in parallel, its accuracy is only around 87%, due to lack of a kernel implementation of it in Spark. We then used the random forest algorithm to classify the entire set of 1,183,850,913 objects with an accuracy of 94% in about 17 hours of processing time. The result set is significant as even collecting spectroscopic data for these many objects would take decades, and our classifications can help astronomers and astrophysicists carry out further studies.
引用
收藏
页码:249 / 257
页数:9
相关论文
共 50 条
  • [41] A Photometric and Spectroscopic Investigation of the DB White Dwarf Population Using SDSS and Gaia Data
    Genest-Beaulieu, C.
    Bergeron, P.
    [J]. ASTROPHYSICAL JOURNAL, 2019, 882 (02):
  • [42] Classification of melanoma from Dermoscopic data using machine learning techniques
    Janney J, Bethanney
    Roslin, S. Emalda
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (5-6) : 3713 - 3728
  • [43] Medical Data Clustering and Classification Using TLBO and Machine Learning Algorithms
    Dubey, Ashutosh Kumar
    Gupta, Umesh
    Jain, Sonal
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (03): : 4523 - 4543
  • [44] Traffic Data Classification using Machine Learning Algorithms in SDN Networks
    Kwon, Jungmin
    Jung, Daeun
    Park, Hyunggon
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1031 - 1033
  • [45] Data-Driven Consensus Protocol Classification Using Machine Learning
    Marcozzi, Marco
    Filatovas, Ernestas
    Stripinis, Linas
    Paulavicius, Remigijus
    [J]. MATHEMATICS, 2024, 12 (02)
  • [46] Cancer Classification of Gene Expression Data using Machine Learning Models
    De Guia, Joseph M.
    Devaraj, Madhavi
    Vea, Larry A.
    [J]. 2018 IEEE 10TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2018,
  • [47] Fetal health classification from cardiotocographic data using machine learning
    Mehbodniya, Abolfazl
    Lazar, Arokia Jesu Prabhu
    Webber, Julian
    Sharma, Dilip Kumar
    Jayagopalan, Santhosh
    Kousalya, K.
    Singh, Pallavi
    Rajan, Regin
    Pandya, Sharnil
    Sengan, Sudhakar
    [J]. EXPERT SYSTEMS, 2022, 39 (06)
  • [48] Chatter Classification in Turning using Machine Learning and Topological Data Analysis
    Khasawneh, Firas A.
    Munch, Elizabeth
    Perea, Jose A.
    [J]. IFAC PAPERSONLINE, 2018, 51 (14): : 195 - 200
  • [49] Classification of Cardiovascular Risk Using Accelerometer Data and Machine Learning Algorithms
    Boiarskaia, Elena
    Liang, Feng
    Zhu, Weimo
    [J]. MEDICINE AND SCIENCE IN SPORTS AND EXERCISE, 2014, 46 (05): : 717 - 717
  • [50] CLASSIFICATION OF FACIAL EXPRESSIONS USING DATA MINING AND MACHINE LEARNING ALGORITHMS
    Faria, Brigida Monica
    Lau, Nuno
    Reis, Luis Paulo
    [J]. SISTEMAS E TECHNOLOGIAS DE INFORMACAO: ACTAS DA 4A CONFERENCIA IBERICA DE SISTEMAS E TECNOLOGIAS DE LA INFORMACAO, 2009, : 197 - +