actsnclass.DataBase

class actsnclass.DataBase

DataBase object, upon which the active learning loop is performed.

Variables:
  • classprob (np.array()) – Classification probability for all objects, [pIa, pnon-Ia].

  • data (pd.DataFrame) – Complete information read from features files.

  • features (pd.DataFrame()) – Feature matrix to be used in classification (no metadata).

  • features_names (list) – Header for attribute features.

  • metadata (pd.DataFrame) – Features matrix which will not be used in classification.

  • metadata_names (list) – Header for metadata.

  • metrics_list_names (list) – Values for metric elements.

  • predicted_class (np.array()) – Predicted classes - results from ML classifier.

  • queried_sample (np.array()) – Complete information of queried objects.

  • queryable_ids (np.array()) – Flag for objects available to be queried.

  • test_features (pd.DataFrame) – Features matrix for the test sample.

  • test_metadata (pd.DataFrame()) – Metadata for the test sample

  • test_labels (np.array()) – True classification for the test sample.

  • train_features (pd.DataFrame()) – Features matrix for the train sample.

  • train_metadata (pd.DataFrame()) – Metadata for the training sample.

  • train_labels (np.array()) – Classes for the training sample.

load_bazin_features(path_to_bazin_file: str)

Load Bazin features from file

load_features(path_to_file: str, method: str)

Load features according to the chosen feature extraction method.

build_samples(initial_training: str or int, nclass: int)

Separate train and test samples.

classify(method: str)

Apply a machine learning classifier.

evaluate_classification(metric_label: str)

Evaluate results from classification.

make_query(strategy: str, batch: int) list

Identify new object to be added to the training sample.

update_samples(query_indx: list)

Add the queried obj(s) to training and remove them from test.

save_metrics(loop: int, output_metrics_file: str)

Save current metrics to file.

save_queried_sample(queried_sample_file: str, loop: int, full_sample: str)

Save queried sample to file.

Examples

>>> from actsnclass import DataBase

Define the necessary paths

>>> path_to_bazin_file = 'results/Bazin.dat'
>>> metrics_file = 'results/metrics.dat'
>>> query_file = 'results/query_file.dat'

Initiate the DataBase object and load the data. >>> data = DataBase() >>> data.load_features(path_to_bazin_file, method=’Bazin’)

Separate training and test samples and classify

>>> data.build_samples(initial_training='original', nclass=2)
>>> data.classify(method='RandomForest')
>>> print(data.classprob)          # check predicted probabilities
[[0.461 0.539]
[0.346print(data.metrics_list_names)           # check metric header
['acc', 'eff', 'pur', 'fom']
>>> print(data.metrics_list_values)          # check metric values
[0.5975434599574068, 0.9024767801857585,
0.34684684684684686, 0.13572404702012383] 0.654]
...
[0.398 0.602]
[0.396 0.604]]

Calculate classification metrics

>>> data.evaluate_classification(metric_label='snpcc')
>>>

Make query, choose object and update samples

>>> indx = data.make_query(strategy='UncSampling', batch=1)
>>> data.update_samples(indx)

Save results to file

>>> data.save_metrics(loop=0, output_metrics_file=metrics_file)
>>> data.save_queried_sample(loop=0, queried_sample_file=query_file,
>>>                          full_sample=False)
__init__()

Methods

__init__()

build_samples(initial_training[, nclass, ...])

Separate train and test samples.

classify([method, screen, n_est, seed, ...])

Apply a machine learning classifier.

evaluate_classification([metric_label, screen])

Evaluate results from classification.

load_bazin_features(path_to_bazin_file[, screen])

Load Bazin features from file.

load_features(path_to_file[, method, screen])

Load features according to the chosen feature extraction method.

make_query([strategy, batch, seed, screen])

Identify new object to be added to the training sample.

save_metrics(loop, output_metrics_file, epoch)

Save current metrics to file.

save_queried_sample(queried_sample_file, loop)

Save queried sample to file.

update_samples(query_indx, loop[, screen])

Add the queried obj(s) to training and remove them from test.