actsnclass.DataBase¶
-
class
actsnclass.
DataBase
¶ DataBase object, upon which the active learning loop is performed.
Variables: - classprob (np.array()) – Classification probability for all objects, [pIa, pnon-Ia].
- data (pd.DataFrame) – Complete information read from features files.
- features (pd.DataFrame()) – Feature matrix to be used in classification (no metadata).
- features_names (list) – Header for attribute features.
- metadata (pd.DataFrame) – Features matrix which will not be used in classification.
- metadata_names (list) – Header for metadata.
- metrics_list_names (list) – Values for metric elements.
- predicted_class (np.array()) – Predicted classes - results from ML classifier.
- queried_sample (np.array()) – Complete information of queried objects.
- queryable_ids (np.array()) – Flag for objects available to be queried.
- test_features (pd.DataFrame) – Features matrix for the test sample.
- test_metadata (pd.DataFrame()) – Metadata for the test sample
- test_labels (np.array()) – True classification for the test sample.
- train_features (pd.DataFrame()) – Features matrix for the train sample.
- train_metadata (pd.DataFrame()) – Metadata for the training sample.
- train_labels (np.array()) – Classes for the training sample.
-
load_bazin_features
(path_to_bazin_file: str)¶ Load Bazin features from file
-
load_features
(path_to_file: str, method: str)¶ Load features according to the chosen feature extraction method.
-
build_samples
(initial_training: str or int, nclass: int)¶ Separate train and test samples.
-
classify
(method: str)¶ Apply a machine learning classifier.
-
evaluate_classification
(metric_label: str)¶ Evaluate results from classification.
-
make_query
(strategy: str, batch: int) → list¶ Identify new object to be added to the training sample.
-
update_samples
(query_indx: list)¶ Add the queried obj(s) to training and remove them from test.
-
save_metrics
(loop: int, output_metrics_file: str)¶ Save current metrics to file.
-
save_queried_sample
(queried_sample_file: str, loop: int, full_sample: str)¶ Save queried sample to file.
Examples
>>> from actsnclass import DataBase
Define the necessary paths
>>> path_to_bazin_file = 'results/Bazin.dat' >>> metrics_file = 'results/metrics.dat' >>> query_file = 'results/query_file.dat'
Initiate the DataBase object and load the data. >>> data = DataBase() >>> data.load_features(path_to_bazin_file, method=’Bazin’)
Separate training and test samples and classify
>>> data.build_samples(initial_training='original', nclass=2) >>> data.classify(method='RandomForest') >>> print(data.classprob) # check predicted probabilities [[0.461 0.539] [0.346print(data.metrics_list_names) # check metric header ['acc', 'eff', 'pur', 'fom']
>>> print(data.metrics_list_values) # check metric values [0.5975434599574068, 0.9024767801857585, 0.34684684684684686, 0.13572404702012383] 0.654] ... [0.398 0.602] [0.396 0.604]]
Calculate classification metrics
>>> data.evaluate_classification(metric_label='snpcc') >>>
Make query, choose object and update samples
>>> indx = data.make_query(strategy='UncSampling', batch=1) >>> data.update_samples(indx)
Save results to file
>>> data.save_metrics(loop=0, output_metrics_file=metrics_file) >>> data.save_queried_sample(loop=0, queried_sample_file=query_file, >>> full_sample=False)
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
()Initialize self. build_samples
(initial_training[, nclass, …])Separate train and test samples. classify
([method, screen, n_est, seed, …])Apply a machine learning classifier. evaluate_classification
([metric_label, screen])Evaluate results from classification. load_bazin_features
(path_to_bazin_file[, screen])Load Bazin features from file. load_features
(path_to_file[, method, screen])Load features according to the chosen feature extraction method. make_query
([strategy, batch, seed, screen])Identify new object to be added to the training sample. save_metrics
(loop, output_metrics_file, epoch)Save current metrics to file. save_queried_sample
(queried_sample_file, loop)Save queried sample to file. update_samples
(query_indx, loop[, epoch, screen])Add the queried obj(s) to training and remove them from test.