Feature Extraction

The first stage in consists in transforming the raw data into a uniform data matrix which will subsequently be given as input to the learning algorithm.

The current implementation of actsnclass text-like data from the SuperNova Photometric Classification Challenge (SNPCC) which is described in Kessler et al., 2010.

Processing 1 Light curve

The raw data looks like this:

SURVEY: DES   
SNID:   848233   
IAUC:    UNKNOWN 
PHOTOMETRY_VERSION: DES 
SNTYPE:  22 
FILTERS: griz 
RA:      36.750000  deg 
DECL:    -4.500000  deg 
MAGTYPE: LOG10  
MAGREF:  AB  
FAKE:    2   (=> simulated LC with snlc_sim.exe) 
MWEBV:   0.0283    MW E(B-V) 
REDSHIFT_HELIO:   0.50369 +- 0.00500  (Helio, z_best) 
REDSHIFT_FINAL:   0.50369 +- 0.00500  (CMB) 
REDSHIFT_SPEC:    0.50369 +- 0.00500  
REDSHIFT_STATUS: OK 
 
HOST_GALAXY_GALID:   17173 
HOST_GALAXY_PHOTO-Z:   0.4873  +- 0.0318  



SIM_MODEL:  NONIA  10  (name index) 
SIM_NON1a:      30   (non1a index) 
SIM_COMMENT:  SN Type = II , MODEL = SDSS-017564  
SIM_LIBID:  2  
SIM_REDSHIFT:  0.5029  
SIM_HOSTLIB_TRUEZ:  0.5000  (actual Z of hostlib) 
SIM_HOSTLIB_GALID:  17173  
SIM_DLMU:      42.276020  mag   [ -5*log10(10pc/dL) ]  
SIM_RA:        36.750000 deg  
SIM_DECL:      -4.500000 deg  
SIM_MWEBV:   0.0256   (MilkyWay E(B-V)) 
SIM_PEAKMAG:   22.48  22.87  22.70  22.82  (griz obs)
SIM_EXPOSURE:     1.0    1.0    1.0    1.0  (griz obs)
SIM_PEAKMJD:   56251.609375  days 
SIM_SALT2x0:   1.229e-17   
SIM_MAGDIM:    0.000  
SIM_SEARCHEFF_MASK:  3  (bits 1,2=> found by software,humans) 
SIM_SEARCHEFF:  1.0000  (spectro-search efficiency (ignores pipelines)) 
SIM_TRESTMIN:   -38.24   days 
SIM_TRESTMAX:    64.80   days 
SIM_RISETIME_SHIFT:   0.0 days 
SIM_FALLTIME_SHIFT:   0.0 days 

SEARCH_PEAKMJD:   56250.734  


# ============================================ 
# TERSE LIGHT CURVE OUTPUT: 
#
NOBS: 108 
NVAR: 9 
VARLIST:  MJD  FLT FIELD   FLUXCAL   FLUXCALERR   SNR    MAG     MAGERR  SIM_MAG
OBS:  56194.145  g NULL   7.600e+00   4.680e+00   1.62   99.000    5.000   98.926
OBS:  56194.156  r NULL   3.875e+00   2.752e+00   1.41   99.000    5.000   98.953
OBS:  56194.172  i NULL   3.585e+00   4.628e+00   0.77   99.000    5.000   99.033
OBS:  56194.188  z NULL  -2.203e+00   4.463e+00  -0.49   99.000    5.000   98.983
OBS:  56207.188  g NULL  -7.008e+00   4.367e+00  -1.60   99.000    5.000   98.926
OBS:  56207.195  r NULL  -1.189e+00   3.459e+00  -0.34   99.000    5.000   98.953
OBS:  56207.203  i NULL   8.799e+00   6.249e+00   1.41   99.000    5.000   99.033

You can load this data using:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
>>> from actsnclass.fit_lightcurves import LightCurve

>>> path_to_lc = 'data/SIMGEN_PUBLIC_DES/DES_SN848233.DAT'

>>> lc = LightCurve()                        # create light curve instance
>>> lc.load_snpcc_lc(path_to_lc)             # read data
>>> lc.photometry                            # check structure of photometry
          mjd band     flux  fluxerr   SNR
 0    56194.145    g   7.600    4.680   1.62
 1    56194.156    r   3.875    2.752   1.41
 ...        ...  ...      ...      ...   ...
 106  56348.008    z  70.690    6.706  10.54
 107  56348.996    g  26.000    5.581   4.66
 [108 rows x 5 columns]

Once the data is loaded, you can fit each individual filter to the parametric function proposed by Bazin et al., 2009 in one specific filter.

1
2
3
>>> rband_features = lc.fit_bazin('r')
>>> print(rband_features)
[159.25796385, -13.39398527,  55.16210333, 111.81204143, -20.13492354]

The designation for each parameter are stored in:

It is possible to perform the fit in all filters at once and visualize the result using:

1
2
3
>>> lc.fit_bazin_all()                            # perform Bazin fit in all filters
>>> lc.plot_bazin_fit(save=True, show=True,
                      output_file='plots/SN' + str(lc.id) + '.png')   # save to file
Bazing fit to light curve.

Processing all light curves in the data set

There are 2 way to perform the Bazin fits for the entire SNPCC data set. Using a python interpreter,

1
2
3
4
5
>>> from actsnclass import fit_snpcc_bazin

>>> path_to_data_dir = 'data/SIMGEN_PUBLIC_DES/'            # raw data directory
>>> output_file = 'results/Bazin.dat'                              # output file
>>> fit_snpcc_bazin(path_to_data_dir=path_to_data_dir, features_file=output_file)

The above will produce a file called Bazin.dat in the results directory.

The same result can be achieved using the command line:

>> fit_dataset.py -dd <path_to_data_dir> -o <output_file>