Mining knowledge in astrophysical massive data sets

被引:1
|
作者
Brescia, Massimo [1 ]
Longo, Giuseppe [2 ]
Pasian, Fabio [3 ]
机构
[1] Osserv Astron Capodimonte, INAF, I-80131 Naples, Italy
[2] Univ Naples Federico 2, Dipartimento Fis, I-80125 Naples, Italy
[3] Osserv Astron Trieste, INAF, I-34143 Trieste, Italy
关键词
Astrophysics; Astroinformatics; Data mining; Virtual observatory; Distributed computing; Knowledge discovery; Machine learning;
D O I
10.1016/j.nima.2010.02.002
中图分类号
TH7 [仪器、仪表];
学科分类号
0804 ; 080401 ; 081102 ;
摘要
Modern scientific data mainly consist of huge data sets gathered by a very large number of techniques and stored in much diversified and often incompatible data repositories. More in general, in the e-science environment, it is considered as a critical and urgent requirement to integrate services across distributed, heterogeneous, dynamic "virtual organizations" formed by different resources within a single enterprise. In the last decade, Astronomy has become an immensely data-rich field due to the evolution of detectors (plates to digital to mosaics), telescopes and space instruments. The Virtual Observatory approach consists of the federation under common standards of all astronomical archives available worldwide, as well as data analysis, data mining and data exploration applications. The main drive behind such an effort is that once the infrastructure is complete, it will allow a new type of multi-wavelength, multi-epoch science, which can only be barely imagined. Data mining, or knowledge discovery in databases, while being the main methodology to extract the scientific information contained in such Massive Data Sets (MDS), poses crucial problems since it has to orchestrate complex problems posed by transparent access to different computing environments, scalability of algorithms, reusability of resources, etc. In the present paper we summarize the present status of the MDS in the Virtual Observatory and what is currently done and planned to bring advanced data mining methodologies in the case of the DAME (DAta Mining and Exploration) project. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:845 / 849
页数:5
相关论文
共 50 条
  • [31] Segmented regression estimators for massive data sets
    Natarajan, R
    Pednault, E
    [J]. PROCEEDINGS OF THE SECOND SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2002, : 566 - 582
  • [32] Confidence interval construction in massive data sets
    Song, Kai
    Xie, Xiaoyue
    Shi, Jian
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (03) : 1035 - 1048
  • [33] Knowledge discovery and data mining
    Brodley, CE
    Lane, T
    Stough, TM
    [J]. AMERICAN SCIENTIST, 1999, 87 (01) : 54 - 61
  • [34] Modeling and analyzing massive terrain data sets
    Agarwal, Pankaj K.
    [J]. ALGORITHMS AND COMPUTATION, 2007, 4835 : 1 - 1
  • [35] Data mining for knowledge organization
    Yamanishi, Kenji
    Morinaga, Satoshi
    [J]. NEC Journal of Advanced Technology, 2005, 2 (02): : 129 - 136
  • [36] Knowledge discovery and data mining
    Lee, HY
    Lu, HJ
    Motoda, H
    [J]. KNOWLEDGE-BASED SYSTEMS, 1998, 10 (07) : 401 - 402
  • [37] A computational study of DEA with massive data sets
    Dula, J. H.
    [J]. COMPUTERS & OPERATIONS RESEARCH, 2008, 35 (04) : 1191 - 1203
  • [38] Mining geophysical data for knowledge
    Mesrobian, E
    Muntz, R
    Shek, E
    Nittel, S
    LaRouche, M
    Kriguer, M
    Mechoso, C
    Farrara, J
    Stolorz, P
    Nakamura, H
    [J]. IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1996, 11 (05): : 34 - 44
  • [39] Not just for the birds - Archiving massive data sets
    Gorder, PF
    [J]. COMPUTING IN SCIENCE & ENGINEERING, 2006, 8 (03) : 3 - 4
  • [40] Visual data mining of large spatial data sets
    Keim, DA
    Panse, C
    Sips, M
    [J]. DATABASES IN NETWORKED INFORMATION SYSTEMS, PROCEEDINGS, 2003, 2822 : 201 - 215