Benchmarking and scalability of machine-learning methods for photometric redshift estimation

被引:23
|
作者
Henghes, Ben [1 ]
Pettitt, Connor [2 ]
Thiyagalingam, Jeyan [2 ]
Hey, Tony [2 ]
Lahav, Ofer [1 ]
机构
[1] UCL, Dept Phys & Astron, Gower St, London WC1E 6BT, England
[2] Rutherford Appleton Lab, Sci Comp Dept, Sci & Technol Facil Council STFC, Harwell Campus, Didcot OX11 0QX, Oxon, England
基金
美国国家科学基金会; 英国科学技术设施理事会; 美国安德鲁·梅隆基金会; 欧洲研究理事会;
关键词
methods: data analysis; galaxies: distances and redshifts; cosmology: observations; DIGITAL SKY SURVEY;
D O I
10.1093/mnras/stab1513
中图分类号
P1 [天文学];
学科分类号
0704 ;
摘要
Obtaining accurate photometric redshift (photo-z) estimations is an important aspect of cosmology, remaining a prerequisite of many analyses. In creating novel methods to produce photo-z estimations, there has been a shift towards using machine-learning techniques. However, there has not been as much of a focus on how well different machine-learning methods scale or perform with the ever-increasing amounts of data being produced. Here, we introduce a benchmark designed to analyse the performance and scalability of different supervised machine-learning methods for photo-z estimation. Making use of the Sloan Digital Sky Survey (SDSS - DR12) data set, we analysed a variety of the most used machine-learning algorithms. By scaling the number of galaxies used to train and test the algorithms up to one million, we obtained several metrics demonstrating the algorithms' performance and scalability for this task. Furthermore, by introducing a new optimization method, time-considered optimization, we were able to demonstrate how a small concession of error can allow for a great improvement in efficiency. From the algorithms tested, we found that the Random Forest performed best with a mean squared error, MSE = 0.0042; however, as other algorithms such as Boosted Decision Trees and k-Nearest Neighbours performed very similarly, we used our benchmarks to demonstrate how different algorithms could be superior in different scenarios. We believe that benchmarks like this will become essential with upcoming surveys, such as the Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST), which will capture billions of galaxies requiring photometric redshifts.
引用
收藏
页码:4847 / 4856
页数:10
相关论文
共 50 条
  • [21] Machine-Learning Methods for Complex Flows
    Vinuesa, Ricardo
    Le Clainche, Soledad
    ENERGIES, 2022, 15 (04)
  • [22] Machine Learning Classification to Identify Catastrophic Outlier Photometric Redshift Estimates
    Singal, J.
    Silverman, G.
    Jones, E.
    Do, T.
    Boscoe, B.
    Wan, Y.
    ASTROPHYSICAL JOURNAL, 2022, 928 (01):
  • [23] Machine-learning Classifiers for Intermediate Redshift Emission-line Galaxies
    Zhang, Kai
    Schlegel, David J.
    Andrews, Brett H.
    Comparat, Johan
    Schafer, Christoph
    Vazquez Mata, Jose Antonio
    Kneib, Jean-Paul
    Yan, Renbin
    ASTROPHYSICAL JOURNAL, 2019, 883 (01):
  • [24] A Machine-Learning Approach for Earthquake Magnitude Estimation
    Mousavi, S. Mostafa
    Beroza, Gregory C.
    GEOPHYSICAL RESEARCH LETTERS, 2020, 47 (01)
  • [25] Learning Spectral Templates for Photometric Redshift Estimation from Broadband Photometry
    Crenshaw, John Franklin
    Connolly, Andrew J.
    ASTRONOMICAL JOURNAL, 2020, 160 (04):
  • [26] Methods for Automatic Machine-Learning Workflow Analysis
    Wendlinger, Lorenz
    Berndl, Emanuel
    Granitzer, Michael
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: APPLIED DATA SCIENCE TRACK, PT V, 2021, 12979 : 52 - 67
  • [27] ShinvLearner: A containerized benchmarking tool for machine-learning classification of tabular data
    Piccolo, Stephen R.
    Lee, Terry J.
    Suh, Erica
    Hill, Kimball
    GIGASCIENCE, 2020, 9 (04):
  • [28] Deep learning methods for obtaining photometric redshift estimations from images
    Henghes, Ben
    Thiyagalingam, Jeyan
    Pettitt, Connor
    Hey, Tony
    Lahav, Ofer
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2022, 512 (02) : 1696 - 1709
  • [29] Machine-Learning Methods on Noisy and Sparse Data
    Poulinakis, Konstantinos
    Drikakis, Dimitris
    Kokkinakis, Ioannis W.
    Spottswood, Stephen Michael
    MATHEMATICS, 2023, 11 (01)
  • [30] Machine-Learning Methods for Computational Science and Engineering
    Frank, Michael
    Drikakis, Dimitris
    Charissis, Vassilis
    COMPUTATION, 2020, 8 (01)