actsnclass.DataBase¶
-
class
actsnclass.DataBase¶ DataBase object, upon which the active learning loop is performed.
Variables: - classprob (np.array) – Classification probability for all objects, [pIa, pnon-Ia].
- data (pd.DataFrame) – Complete information read from features files.
- features (pd.DataFrame) – Feature matrix to be used in classification (no metadata).
- features_names (list) – Header for attribute features.
- metadata (pd.DataFrame) – Features matrix which will not be used in classification.
- metadata_names (list) – Header for metadata.
- metrics_list_names (list) – Values for metric elements.
- output_photo_Ia (pd.DataFrame) – Returns metadata for photometrically classified Ia.
- photo_Ia_metadata (pd.DataFrame) – Metadata for photometrically classified object ids.
- plasticc_mjd_lim (list) – [min, max] mjds for plasticc data
- predicted_class (np.array) – Predicted classes - results from ML classifier.
- queried_sample (list) – Complete information of queried objects.
- queryable_ids (np.array()) – Flag for objects available to be queried.
- test_features (np.array()) – Features matrix for the test sample.
- test_metadata (pd.DataFrame) – Metadata for the test sample
- test_labels (np.array()) – True classification for the test sample.
- train_features (np.array()) – Features matrix for the train sample.
- train_metadata (pd.DataFrame) – Metadata for the training sample.
- train_labels (np.array) – Classes for the training sample.
-
build_samples(initial_training: str or int, nclass: int)¶ Separate train and test samples.
-
classify(method: str)¶ Apply a machine learning classifier.
-
classify_bootstrap(method: str)¶ Apply a machine learning classifier bootstrapping the classifier
-
evaluate_classification(metric_label: str)¶ Evaluate results from classification.
-
identify_keywords()¶ Break degenerescency between keywords with equal meaning.
-
load_bazin_features(path_to_bazin_file: str)¶ Load Bazin features from file
-
load_photometry_features(path_to_photometry_file:str)¶ Load photometric light curves from file
-
load_plasticc_mjd(path_to_data_dir: str)¶ Get min and max mjds for PLAsTiCC data
-
load_features(path_to_file: str, method: str)¶ Load features according to the chosen feature extraction method.
-
make_query(strategy: str, batch: int) → list¶ Identify new object to be added to the training sample.
-
save_metrics(loop: int, output_metrics_file: str)¶ Save current metrics to file.
-
save_queried_sample(queried_sample_file: str, loop: int, full_sample: str)¶ Save queried sample to file.
-
update_samples(query_indx: list)¶ Add the queried obj(s) to training and remove them from test.
Examples
>>> from actsnclass import DataBase
Define the necessary paths
>>> path_to_bazin_file = 'results/Bazin.dat' >>> metrics_file = 'results/metrics.dat' >>> query_file = 'results/query_file.dat'
Initiate the DataBase object and load the data. >>> data = DataBase() >>> data.load_features(path_to_bazin_file, method=’Bazin’)
Separate training and test samples and classify
>>> data.build_samples(initial_training='original', nclass=2) >>> data.classify(method='RandomForest') >>> print(data.classprob) # check predicted probabilities [[0.461 0.539] [0.346print(data.metrics_list_names) # check metric header ['acc', 'eff', 'pur', 'fom']
>>> print(data.metrics_list_values) # check metric values [0.5975434599574068, 0.9024767801857585, 0.34684684684684686, 0.13572404702012383] 0.654] ... [0.398 0.602] [0.396 0.604]]
Calculate classification metrics
>>> data.evaluate_classification(metric_label='snpcc') >>>
Make query, choose object and update samples
>>> indx = data.make_query(strategy='UncSampling', batch=1) >>> data.update_samples(indx)
Save results to file
>>> data.save_metrics(loop=0, output_metrics_file=metrics_file) >>> data.save_queried_sample(loop=0, queried_sample_file=query_file, >>> full_sample=False)
-
__init__()¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__()Initialize self. build_orig_samples([nclass, screen, …])Construct train and test samples as given in the original data set. build_previous_runs(path_to_train, …[, …])Build train, test and queryable samples from previous runs. build_random_training(initial_training[, …])Construct initial random training and corresponding test sample. build_samples([initial_training, nclass, …])Separate train and test samples. classify(method, **kwargs)Apply a machine learning classifier. classify_bootstrap(method, **kwargs)Apply a machine learning classifier bootstrapping the classifier. evaluate_classification([metric_label])Evaluate results from classification. identify_keywords()Break degenerescency between keywords with equal meaning. load_bazin_features(path_to_bazin_file[, …])Load Bazin features from file. load_features(path_to_file[, method, …])Load features according to the chosen feature extraction method. load_photometry_features(path_to_photometry_file)Load photometry features from file. load_plasticc_mjd(path_to_data_dir)Return all MJDs from 1 file from PLAsTiCC simulations. make_query([strategy, batch, screen, …])Identify new object to be added to the training sample. output_photo_Ia(threshold[, to_file, filename])Returns the metadata for photometrically classified SN Ia. save_metrics(loop, output_metrics_file, epoch)Save current metrics to file. save_queried_sample(queried_sample_file, loop)Save queried sample to file. update_samples(query_indx, loop[, epoch])Add the queried obj(s) to training and remove them from test.