Data mining: harvesting the depths of astronomy


Long lost are the situations in which you had to observe a star or a galaxy with a telescope and mark all your observation on paper.

Today you don’t even see the telescope or satellite, and you are even of a different continent than the instrument of observation. You only have your personal computer and a good connection to the Internet. This is the whole concept of Virtual Observatory.

You can read more about this in a article by Kirk D. Borne from the Department of Computational and Data Sciences, George Mason University, USA where the author describes several methods for data mining in astronomy.

The subject is discussed in the context of the largest data-producing astronomy project in the coming decade – the LSST (Large Synoptic Survey Telescope) , a project designed to survey the sky over a period of 10 years using what will be the world’s largest digital camera with 3200 Megapixels.

This project alone will generate 30 TB of data per each night of observation producing a 10-year “movie” of the night sky. For example we discovered around 10,000 supernovae so far but LSST will be able to record 1000 new supernovae each night!

With 10,000-100,000 alerts each night a project of this magnitude must deal with the huge amount of data not only fast but reliable also. You will have to: assign new objects to known classes, discover new classes of objects, make rules for the different classes, refine the rules using training samples.

There are various algorithms developed to deal with huge amount of data this includes:

Bayesian Analysis – for example to distinguish galaxies from stars among the many thousands of objects detected in large images.;

Decision Trees – used in the identification of cosmic ray contamination in astronomical images taken using charge-coupled device (CCD) cameras;

Neural Networks – more recently applied in the classification of different galaxy types within large databases of galaxy data ;

Support Vector Machines (SVM) – used in the determination of the photometric redshift estimate for distant galaxies or for forecasting solar flares and solar wind-induced geostorm;

All of the previous and future astronomical projects will build a virtual sky that anybody can mine and discover new phenomenons and objects, this projects include Palomar-Quest Synoptic Sky Survey (PQ), Sloan Digital Sky Survey (SDSS), 2-Micron All Sky Survey (2MASS)) and in the near future LSST, Palomar Transient Factory (PTF), Supernova Acceleration Probe (SNAP), Panoramic Survey Telescope And Rapid Response System (PanSTARRS), and Dark Energy Survey (DES) and will deliver petabyte catalogs.

One Comment to “Data mining: harvesting the depths of astronomy”

  1. […] This post was mentioned on Twitter by Alexandru Lapusan, Zitec. Zitec said: [] Data mining: harvesting the depths of astronomy […]