actsnclass.DataBase

class actsnclass.DataBase

DataBase object, upon which the active learning loop is performed.

Variables:
  • classprob (np.array()) – Classification probability for all objects, [pIa, pnon-Ia].
  • data (pd.DataFrame) – Complete information read from features files.
  • features (pd.DataFrame()) – Feature matrix to be used in classification (no metadata).
  • features_names (list) – Header for attribute features.
  • metadata (pd.DataFrame) – Features matrix which will not be used in classification.
  • metadata_names (list) – Header for metadata.
  • metrics_list_names (list) – Values for metric elements.
  • predicted_class (np.array()) – Predicted classes - results from ML classifier.
  • queried_sample (np.array()) – Complete information of queried objects.
  • queryable_ids (np.array()) – Flag for objects available to be queried.
  • test_features (pd.DataFrame) – Features matrix for the test sample.
  • test_metadata (pd.DataFrame()) – Metadata for the test sample
  • test_labels (np.array()) – True classification for the test sample.
  • train_features (pd.DataFrame()) – Features matrix for the train sample.
  • train_metadata (pd.DataFrame()) – Metadata for the training sample.
  • train_labels (np.array()) – Classes for the training sample.
load_bazin_features(path_to_bazin_file: str)

Load Bazin features from file

load_features(path_to_file: str, method: str)

Load features according to the chosen feature extraction method.

build_samples(initial_training: str or int, nclass: int)

Separate train and test samples.

classify(method: str)

Apply a machine learning classifier.

evaluate_classification(metric_label: str)

Evaluate results from classification.

make_query(strategy: str, batch: int) → list

Identify new object to be added to the training sample.

update_samples(query_indx: list)

Add the queried obj(s) to training and remove them from test.

save_metrics(loop: int, output_metrics_file: str)

Save current metrics to file.

save_queried_sample(queried_sample_file: str, loop: int, full_sample: str)

Save queried sample to file.

Examples

>>> from actsnclass import DataBase

Define the necessary paths

>>> path_to_bazin_file = 'results/Bazin.dat'
>>> metrics_file = 'results/metrics.dat'
>>> query_file = 'results/query_file.dat'

Initiate the DataBase object and load the data. >>> data = DataBase() >>> data.load_features(path_to_bazin_file, method=’Bazin’)

Separate training and test samples and classify

>>> data.build_samples(initial_training='original', nclass=2)
>>> data.classify(method='RandomForest')
>>> print(data.classprob)          # check predicted probabilities
[[0.461 0.539]
[0.346print(data.metrics_list_names)           # check metric header
['acc', 'eff', 'pur', 'fom']
>>> print(data.metrics_list_values)          # check metric values
[0.5975434599574068, 0.9024767801857585,
0.34684684684684686, 0.13572404702012383] 0.654]
...
[0.398 0.602]
[0.396 0.604]]

Calculate classification metrics

>>> data.evaluate_classification(metric_label='snpcc')
>>>

Make query, choose object and update samples

>>> indx = data.make_query(strategy='UncSampling', batch=1)
>>> data.update_samples(indx)

Save results to file

>>> data.save_metrics(loop=0, output_metrics_file=metrics_file)
>>> data.save_queried_sample(loop=0, queried_sample_file=query_file,
>>>                          full_sample=False)
__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__() Initialize self.
build_samples(initial_training[, nclass, …]) Separate train and test samples.
classify([method, screen, n_est, seed, …]) Apply a machine learning classifier.
evaluate_classification([metric_label, screen]) Evaluate results from classification.
load_bazin_features(path_to_bazin_file[, screen]) Load Bazin features from file.
load_features(path_to_file[, method, screen]) Load features according to the chosen feature extraction method.
make_query([strategy, batch, seed, screen]) Identify new object to be added to the training sample.
save_metrics(loop, output_metrics_file, epoch) Save current metrics to file.
save_queried_sample(queried_sample_file, loop) Save queried sample to file.
update_samples(query_indx, loop[, epoch, screen]) Add the queried obj(s) to training and remove them from test.