Feature Extraction

The first stage in consists in transforming the raw data into a uniform data matrix which will subsequently be given as input to the learning algorithm.

The original implementation of actsnclass can handle text-like data from the SuperNova Photometric Classification Challenge (SNPCC) which is described in Kessler et al., 2010.

This version is equiped to input RESSPECT simulatons made with the SNANA simulator.

Load 1 light curve:

For RESSPECT

In order to fit a single light curve from the RESSPECT simulations you need to have its identification number. This information is stored in the header SNANA files. One of the possible ways to retrieve it is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
>>> import io
>>> import pandas as pd
>>> import tarfile

>>> path_to_header = '~/RESSPECT_PERFECT_V2_TRAIN_HEADER.tar.gz'

# openning '.tar.gz' files requires some juggling ...
>>> tar = tarfile.open(path_to_header, 'r:gz')
>>> fname = tar.getmembers()[0]
>>> content = tar.extractfile(fname).read()
>>> header = pd.read_csv(io.BytesIO(content))
>>> tar.close()

# get keywords
>>> header.keys()
Index(['objid', 'redshift', 'type', 'code', 'sample'], dtype='object')

# check the first chunks of ids and types
>>> header[['objid', 'type']].iloc[:10]
   objid     type
0   3228  Ibc_V19
1   2241      IIn
2   6770       Ia
3    302      IIn
4   7948       Ia
5   4376   II_V19
6    337   II_V19
7   6017       Ia
8   1695       Ia
9   1660   II-NMF

>> snid = header['objid'].values[4]

Now that you have selected on object, you can fit its light curve using the LightCurve class :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
>>> from actsnclass.fit_lightcurves import LightCurve

>>> path_to_lightcurves = '~/RESSPECT_PERFECT_V2_TRAIN_LIGHTCURVES.tar.gz'

>>> lc = LightCurve()
>>> lc.load_resspect_lc(photo_file=path_to_lightcurves, snid=snid)

# check light curve format
>>> lc.photometry
          mjd band      flux   fluxerr        SNR
0     53058.0    u  0.138225  0.142327   0.971179
1     53058.0    g -0.064363  0.141841  -0.453768
...       ...  ...       ...       ...        ...
1054  53440.0    z  1.173433  0.145918   8.041707
1055  53440.0    Y  0.980438  0.145256   6.749742

[1056 rows x 5 columns]

For PLAsTiCC:

Similar to the case presented below, reading only 1 light curve from PLAsTiCC requires an object identifier. This can be done by:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> from actsnclass.fit_lightcurves import LightCurve
>>> import pandas as pd

>>> path_to_metadata = '~/plasticc_train_metadata.csv.gz'
>>> path_to_lightcurves = '~/plasticc_train_lightcurves.csv.gz'

# read metadata for the entire sample
>>> metadata = pd.read_csv(path_to_metadata)

# check keys
metadata.keys()
Index(['object_id', 'ra', 'decl', 'ddf_bool', 'hostgal_specz',
       'hostgal_photoz', 'hostgal_photoz_err', 'distmod', 'mwebv', 'target',
       'true_target', 'true_submodel', 'true_z', 'true_distmod',
       'true_lensdmu', 'true_vpec', 'true_rv', 'true_av', 'true_peakmjd',
       'libid_cadence', 'tflux_u', 'tflux_g', 'tflux_r', 'tflux_i', 'tflux_z',
       'tflux_y'],
     dtype='object')

# choose 1 object
snid = metadata['object_id'].values[0]

# create light curve object and load data
lc = LightCurve()
lc.load_plasticc_lc(photo_file=path_to_lightcurves, snid=snid)

For SNPCC:

The raw data looks like this:

SURVEY: DES   
SNID:   848233   
IAUC:    UNKNOWN 
PHOTOMETRY_VERSION: DES 
SNTYPE:  22 
FILTERS: griz 
RA:      36.750000  deg 
DECL:    -4.500000  deg 
MAGTYPE: LOG10  
MAGREF:  AB  
FAKE:    2   (=> simulated LC with snlc_sim.exe) 
MWEBV:   0.0283    MW E(B-V) 
REDSHIFT_HELIO:   0.50369 +- 0.00500  (Helio, z_best) 
REDSHIFT_FINAL:   0.50369 +- 0.00500  (CMB) 
REDSHIFT_SPEC:    0.50369 +- 0.00500  
REDSHIFT_STATUS: OK 
 
HOST_GALAXY_GALID:   17173 
HOST_GALAXY_PHOTO-Z:   0.4873  +- 0.0318  



SIM_MODEL:  NONIA  10  (name index) 
SIM_NON1a:      30   (non1a index) 
SIM_COMMENT:  SN Type = II , MODEL = SDSS-017564  
SIM_LIBID:  2  
SIM_REDSHIFT:  0.5029  
SIM_HOSTLIB_TRUEZ:  0.5000  (actual Z of hostlib) 
SIM_HOSTLIB_GALID:  17173  
SIM_DLMU:      42.276020  mag   [ -5*log10(10pc/dL) ]  
SIM_RA:        36.750000 deg  
SIM_DECL:      -4.500000 deg  
SIM_MWEBV:   0.0256   (MilkyWay E(B-V)) 
SIM_PEAKMAG:   22.48  22.87  22.70  22.82  (griz obs)
SIM_EXPOSURE:     1.0    1.0    1.0    1.0  (griz obs)
SIM_PEAKMJD:   56251.609375  days 
SIM_SALT2x0:   1.229e-17   
SIM_MAGDIM:    0.000  
SIM_SEARCHEFF_MASK:  3  (bits 1,2=> found by software,humans) 
SIM_SEARCHEFF:  1.0000  (spectro-search efficiency (ignores pipelines)) 
SIM_TRESTMIN:   -38.24   days 
SIM_TRESTMAX:    64.80   days 
SIM_RISETIME_SHIFT:   0.0 days 
SIM_FALLTIME_SHIFT:   0.0 days 

SEARCH_PEAKMJD:   56250.734  


# ============================================ 
# TERSE LIGHT CURVE OUTPUT: 
#
NOBS: 108 
NVAR: 9 
VARLIST:  MJD  FLT FIELD   FLUXCAL   FLUXCALERR   SNR    MAG     MAGERR  SIM_MAG
OBS:  56194.145  g NULL   7.600e+00   4.680e+00   1.62   99.000    5.000   98.926
OBS:  56194.156  r NULL   3.875e+00   2.752e+00   1.41   99.000    5.000   98.953
OBS:  56194.172  i NULL   3.585e+00   4.628e+00   0.77   99.000    5.000   99.033
OBS:  56194.188  z NULL  -2.203e+00   4.463e+00  -0.49   99.000    5.000   98.983
OBS:  56207.188  g NULL  -7.008e+00   4.367e+00  -1.60   99.000    5.000   98.926
OBS:  56207.195  r NULL  -1.189e+00   3.459e+00  -0.34   99.000    5.000   98.953
OBS:  56207.203  i NULL   8.799e+00   6.249e+00   1.41   99.000    5.000   99.033

You can load this data using:

1
2
3
4
5
6
>>> from actsnclass.fit_lightcurves import LightCurve

>>> path_to_lc = 'data/SIMGEN_PUBLIC_DES/DES_SN848233.DAT'

>>> lc = LightCurve()                        # create light curve instance
>>> lc.load_snpcc_lc(path_to_lc)             # read data

Fit 1 light curve:

Once the data is properly loaded, the photometry can be recovered by:

1
2
3
4
5
6
7
>>> lc.photometry                            # check structure of photometry
          mjd band     flux  fluxerr   SNR
 0    56194.145    g   7.600    4.680   1.62
 1    56194.156    r   3.875    2.752   1.41
 ...        ...  ...      ...      ...   ...
 106  56348.008    z  70.690    6.706  10.54
 107  56348.996    g  26.000    5.581   4.66

You can now fit each individual filter to the parametric function proposed by Bazin et al., 2009 in one specific filter.

1
2
3
>>> rband_features = lc.fit_bazin('r')
>>> print(rband_features)
[159.25796385, -13.39398527,  55.16210333, 111.81204143, -20.13492354]

The designation for each parameter are stored in:

It is possible to perform the fit in all filters at once and visualize the result using:

1
2
3
>>> lc.fit_bazin_all()                            # perform Bazin fit in all filters
>>> lc.plot_bazin_fit(save=True, show=True,
                      output_file='plots/SN' + str(lc.id) + '.png')   # save to file
Bazing fit to light curve. This is an example from RESSPECT perfect simulations.

Example of light curve from RESSPECT perfect simulations.

Processing all light curves in the data set

There are 2 way to perform the Bazin fits for the entire SNPCC data set. Using a python interpreter,

1
2
3
4
5
>>> from actsnclass import fit_snpcc_bazin

>>> path_to_data_dir = 'data/SIMGEN_PUBLIC_DES/'            # raw data directory
>>> output_file = 'results/Bazin.dat'                              # output file
>>> fit_snpcc_bazin(path_to_data_dir=path_to_data_dir, features_file=output_file)

The above will produce a file called Bazin.dat in the results directory.

The same result can be achieved using the command line:

# for SNPCC
 >>> fit_dataset.py -s SNPCC -dd <path_to_data_dir> -o <output_file>

 # for RESSPECT or PLAsTiCC
 >>> fit_dataset.py -s <dataset_name> -p <path_to_photo_file>
          -hd <path_to_header_file> -o <output_file>