gala.classify: Classifier tools

gala.classify.concatenate_data_elements(alldata)[source]

Return one big learning set from a list of learning sets.

A learning set is a list/tuple of length 4 containing features, labels, weights, and node merge history.

gala.classify.default_classifier_extension(cl, use_joblib=True)[source]

Return the default classifier file extension for the given classifier.

Parameters:

cl : sklearn estimator or VigraRandomForest object

A classifier to be saved.

use_joblib : bool, optional

Whether or not joblib will be used to save the classifier.

Returns:

ext : string

File extension

Examples

>>> cl = RandomForestClassifier()
>>> default_classifier_extension(cl)
'.classifier.joblib'
>>> default_classifier_extension(cl, False)
'.classifier'
gala.classify.get_classifier(name='random forest', *args, **kwargs)[source]

Return a classifier given a name.

Parameters:

name : string

The name of the classifier, e.g. ‘random forest’ or ‘naive bayes’.

*args, **kwargs :

Additional arguments to pass to the constructor of the classifier.

Returns:

cl : classifier

A classifier object implementing the scikit-learn interface.

Raises:

NotImplementedError

If the classifier name is not recognized.

Examples

>>> cl = get_classifier('random forest', n_estimators=47)
>>> isinstance(cl, RandomForestClassifier)
True
>>> cl.n_estimators
47
>>> from numpy.testing import assert_raises
>>> assert_raises(NotImplementedError, get_classifier, 'perfect class')
gala.classify.load_classifier(fn)[source]

Load a classifier previously saved to disk, given a filename.

Supported classifier types are: - scikit-learn classifiers saved using either pickle or joblib persistence - vigra random forest classifiers saved in HDF5 format

Parameters:

fn : string

Filename in which the classifier is stored.

Returns:

cl : classifier object

cl is one of the supported classifier types; these support at least the standard scikit-learn interface of fit() and predict_proba()

gala.classify.sample_training_data(features, labels, num_samples=None)[source]

Get a random sample from a classification training dataset.

Parameters:

features: np.ndarray [M x N]

The M (number of samples) by N (number of features) feature matrix.

labels: np.ndarray [M] or [M x 1]

The training label for each feature vector.

num_samples: int, optional

The size of the training sample to draw. Return full dataset if None or if num_samples >= M.

Returns:

feat: np.ndarray [num_samples x N]

The sampled feature vectors.

lab: np.ndarray [num_samples] or [num_samples x 1]

The sampled training labels

gala.classify.save_classifier(cl, fn, use_joblib=True, **kwargs)[source]

Save a classifier to disk.

Parameters:

cl : classifier object

Pickleable object or a classify.VigraRandomForest object.

fn : string

Writeable path/filename.

use_joblib : bool, optional

Whether to prefer joblib persistence to pickle.

kwargs : keyword arguments

Keyword arguments to be passed on to either pck.dump or joblib.dump.

Returns:

None

Notes

For joblib persistence, compress=3 is the default.