actsnclass.DataBase¶

class actsnclass.DataBase¶

DataBase object, upon which the active learning loop is performed.

Variables:

classprob (np.array) – Classification probability for all objects, [pIa, pnon-Ia].
data (pd.DataFrame) – Complete information read from features files.
features (pd.DataFrame) – Feature matrix to be used in classification (no metadata).
features_names (list) – Header for attribute features.
metadata (pd.DataFrame) – Features matrix which will not be used in classification.
metadata_names (list) – Header for metadata.
metrics_list_names (list) – Values for metric elements.
output_photo_Ia (pd.DataFrame) – Returns metadata for photometrically classified Ia.
photo_Ia_metadata (pd.DataFrame) – Metadata for photometrically classified object ids.
plasticc_mjd_lim (list) – [min, max] mjds for plasticc data
predicted_class (np.array) – Predicted classes - results from ML classifier.
queried_sample (list) – Complete information of queried objects.
queryable_ids (np.array()) – Flag for objects available to be queried.
test_features (np.array()) – Features matrix for the test sample.
test_metadata (pd.DataFrame) – Metadata for the test sample
test_labels (np.array()) – True classification for the test sample.
train_features (np.array()) – Features matrix for the train sample.
train_metadata (pd.DataFrame) – Metadata for the training sample.
train_labels (np.array) – Classes for the training sample.

build_samples(initial_training: str or int, nclass: int)¶: Separate train and test samples.

classify(method: str)¶: Apply a machine learning classifier.

classify_bootstrap(method: str)¶: Apply a machine learning classifier bootstrapping the classifier

evaluate_classification(metric_label: str)¶: Evaluate results from classification.

identify_keywords()¶: Break degenerescency between keywords with equal meaning.

load_bazin_features(path_to_bazin_file: str)¶: Load Bazin features from file

load_photometry_features(path_to_photometry_file:str)¶: Load photometric light curves from file

load_plasticc_mjd(path_to_data_dir: str)¶: Get min and max mjds for PLAsTiCC data

load_features(path_to_file: str, method: str)¶: Load features according to the chosen feature extraction method.

make_query(strategy: str, batch: int) → list¶: Identify new object to be added to the training sample.

save_metrics(loop: int, output_metrics_file: str)¶: Save current metrics to file.

save_queried_sample(queried_sample_file: str, loop: int, full_sample: str)¶: Save queried sample to file.

update_samples(query_indx: list)¶: Add the queried obj(s) to training and remove them from test.

Examples

>>> from actsnclass import DataBase

Define the necessary paths

>>> path_to_bazin_file = 'results/Bazin.dat'
>>> metrics_file = 'results/metrics.dat'
>>> query_file = 'results/query_file.dat'

Initiate the DataBase object and load the data. >>> data = DataBase() >>> data.load_features(path_to_bazin_file, method=’Bazin’)

Separate training and test samples and classify

>>> data.build_samples(initial_training='original', nclass=2)
>>> data.classify(method='RandomForest')
>>> print(data.classprob)          # check predicted probabilities
[[0.461 0.539]
[0.346print(data.metrics_list_names)           # check metric header
['acc', 'eff', 'pur', 'fom']

>>> print(data.metrics_list_values)          # check metric values
[0.5975434599574068, 0.9024767801857585,
0.34684684684684686, 0.13572404702012383] 0.654]
...
[0.398 0.602]
[0.396 0.604]]

Calculate classification metrics

>>> data.evaluate_classification(metric_label='snpcc')
>>>

Make query, choose object and update samples

>>> indx = data.make_query(strategy='UncSampling', batch=1)
>>> data.update_samples(indx)

Save results to file

>>> data.save_metrics(loop=0, output_metrics_file=metrics_file)
>>> data.save_queried_sample(loop=0, queried_sample_file=query_file,
>>>                          full_sample=False)

__init__()¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`()	Initialize self.
`build_orig_samples`([nclass, screen, …])	Construct train and test samples as given in the original data set.
`build_previous_runs`(path_to_train, …[, …])	Build train, test and queryable samples from previous runs.
`build_random_training`(initial_training[, …])	Construct initial random training and corresponding test sample.
`build_samples`([initial_training, nclass, …])	Separate train and test samples.
`classify`(method, **kwargs)	Apply a machine learning classifier.
`classify_bootstrap`(method, **kwargs)	Apply a machine learning classifier bootstrapping the classifier.
`evaluate_classification`([metric_label])	Evaluate results from classification.
`identify_keywords`()	Break degenerescency between keywords with equal meaning.
`load_bazin_features`(path_to_bazin_file[, …])	Load Bazin features from file.
`load_features`(path_to_file[, method, …])	Load features according to the chosen feature extraction method.
`load_photometry_features`(path_to_photometry_file)	Load photometry features from file.
`load_plasticc_mjd`(path_to_data_dir)	Return all MJDs from 1 file from PLAsTiCC simulations.
`make_query`([strategy, batch, screen, …])	Identify new object to be added to the training sample.
`output_photo_Ia`(threshold[, to_file, filename])	Returns the metadata for photometrically classified SN Ia.
`save_metrics`(loop, output_metrics_file, epoch)	Save current metrics to file.
`save_queried_sample`(queried_sample_file, loop)	Save queried sample to file.
`update_samples`(query_indx, loop[, epoch])	Add the queried obj(s) to training and remove them from test.