gala.agglo: RAG Agglomeration¶

class gala.agglo.Rag(watershed=array([], dtype=int64), probabilities=array([], dtype=float64), merge_priority_function=<function boundary_mean>, gt_vol=None, feature_manager=<gala.features.base.Null object>, mask=None, show_progress=False, connectivity=1, channel_is_oriented=None, orientation_map=array([], dtype=float64), normalize_probabilities=False, exclusions=array([], dtype=float64), isfrozennode=None, isfrozenedge=None)[source]¶

Region adjacency graph for segmentation of nD volumes.

Parameters:

watershed : array of int, shape (M, N, ..., P)

The labeled regions of the image. Note: this is called watershed for historical reasons, but could refer to a superpixel map of any origin.

probabilities : array of float, shape (M, N, ..., P[, Q])

The probability of each pixel of belonging to a particular class. Typically, this has the same shape as watershed and represents the probability that the pixel is part of a region boundary, but it can also have an additional dimension for probabilities of belonging to other classes, such as mitochondria (in biological images) or specific textures (in natural images).

merge_priority_function : callable function, optional

This function must take exactly three arguments as input (a Rag object and two node IDs) and return a single float.

feature_manager : features.base.Null object, optional

A feature manager object that controls feature computation and feature caching.

mask : array of bool, shape (M, N, ..., P)

A mask of the same shape as watershed, True in the positions to be processed when making a RAG, False in the positions to ignore.

show_progress : bool, optional

Whether to display an ASCII progress bar during long- -running graph operations.

connectivity : int in {1, ..., watershed.ndim}

When determining adjacency, allow neighbors along connectivity dimensions.

channel_is_oriented : array-like of bool, shape (Q,), optional

For multi-channel images, some channels, for example some edge detectors, have a specific orientation. In conjunction with the orientation_map argument, specify which channels have an orientation associated with them.

orientation_map : array-like of float, shape (Q,)

Specify the orientation of the corresponding channel. (2D images only)

normalize_probabilities : bool, optional

Divide the input probabilities by their maximum to ensure a range in [0, 1].

exclusions : array-like of int, shape (M, N, ..., P), optional

Volume of same shape as watershed. Mark points in the volume with the same label (>0) to prevent them from being merged during agglomeration. For example, if exclusions[45, 92] == exclusions[51, 105] == 1, then segments watershed[45, 92] and watershed[51, 105] will never be merged, regardless of the merge priority function.

isfrozennode : function, optional

Function taking in a Rag object and a node id and returning a bool. If the function returns True, the node will not be merged, regardless of the merge priority function.

isfrozenedge : function, optional

As isfrozennode, but the function should take the graph and two nodes, to specify an edge that cannot be merged.

Attributes

Methods

agglomerate(threshold=0.5, save_history=False)[source]¶

Merge nodes hierarchically until given edge confidence threshold.

This is the main workhorse of the agglo module!

Parameters:

threshold : float, optional

The edge priority at which to stop merging.

save_history : bool, optional

Whether to save and return a history of all the merges made.

Returns:

history : list of tuple of int, optional

The ordered history of node pairs merged.

scores : list of float, optional

The list of merge scores corresponding to the history.

evaluation : list of tuple, optional

The split VI after each merge. This is only meaningful if a ground truth volume was provided at build time.

Notes

This function returns None when save_history is False.

agglomerate_count(stepsize=100, save_history=False)[source]¶

Agglomerate until ‘stepsize’ merges have been made.

This function is like agglomerate, but rather than to a certain threshold, a certain number of merges are made, regardless of threshold.

Parameters:

stepsize : int, optional

The number of merges to make.

save_history : bool, optional

Whether to save and return a history of all the merges made.

Returns:

history : list of tuple of int, optional

The ordered history of node pairs merged.

scores : list of float, optional

The list of merge scores corresponding to the history.

evaluation : list of tuple, optional

The split VI after each merge. This is only meaningful if a ground truth volume was provided at build time.

See also

agglomerate

Notes

This function returns None when save_history is False.

agglomerate_ladder(min_size=1000, strictness=2)[source]¶

Merge sequentially all nodes smaller than min_size.

Parameters:

min_size : int, optional

The smallest allowable segment after ladder completion.

strictness : {1, 2, 3}, optional

strictness == 1: all nodes smaller than min_size are merged according to the merge priority function. strictness == 2: in addition to 1, small nodes can only be merged to big nodes. strictness == 3: in addition to 2, nodes sharing less than one pixel of boundary are not agglomerated.

Returns:

None

Notes

Nodes that are on the volume boundary are not agglomerated.

at_volume_boundary(n)[source]¶: Return True if node n touches the volume boundary.

build_boundary_map(ebunch=None)[source]¶

Return a map of the current merge priority.

Parameters:

ebunch : iterable of (int, int), optional

The list of edges for which to build a map. Use all edges if not provided.

Returns:

bm : array of float

The image of the edge weights.

build_graph_from_watershed(idxs=None)[source]¶

Build the graph object from the region labels.

The region labels should have been set ahead of time using set_watershed().

Parameters:

idxs : array-like of int, optional

Linear indices into raveled volume array. If provided, the graph is built only for these indices.

build_merge_queue()[source]¶

Build a queue of node pairs to be merged in a specific priority.

Parameters:

None

Returns:

mq : MergeQueue object

A MergeQueue is a Python deque with a specific element structure: a list of length 4 containing:

the merge priority (any ordered type)

a ‘valid’ flag

and the two nodes in arbitrary order

The valid flag allows one to “remove” elements from the queue in O(1) time by setting the flag to False. Then, one checks the flag when popping elements and ignores those marked as invalid.

One other specific feature is that there are back-links from edges to their corresponding queue items so that when nodes are merged, affected edges can be invalidated and reinserted in the queue with a new priority.

build_volume(nbunch=None)[source]¶

Return the segmentation induced by the graph.

Parameters:

nbunch : iterable of int (node id), optional

A list of nodes for which to build the volume. All nodes are used if this is not provided.

Returns:

seg : array of int

The segmentation implied by the graph.

Notes

This function is very similar to get_segmentation, but it builds the segmentation from the bottom up, rather than using the currently-stored segmentation.

cluster_by_labels(labels, nodes=None)[source]¶: Merge all superpixels with the same label (1 label per 1 sp)

compute_W(merge_priority_function, sigma=5100.0, nodes=None)[source]¶: Computes the weight matrix for clustering

compute_feature_caches()[source]¶

Use the feature manager to compute node and edge feature caches.

Parameters:	None
Returns:	None

compute_orphans()[source]¶

Find all the segments that do not touch the volume boundary.

Parameters:

None

Returns:

orphans : list of int (node id)

A list of node ids.

Notes

This function differs from orphans in that it does not use the graph, but rather computes orphans directly from the segmentation.

copy()[source]¶: Return a copy of the object and attributes.

get_edge_coordinates(n1, n2, arbitrary=False)[source]¶: Find where in the segmentation the edge (n1, n2) is most visible.

get_segmentation(threshold=None)[source]¶

Return the unpadded segmentation represented by the graph.

Remember that the segmentation volume is padded with an “artificial” segment that envelops the volume. This function simply removes the wrapping and returns a segmented volume.

Parameters:

threshold : float, optional

Get the segmentation at the given threshold. If no threshold is given, return the segmentation at the current level of agglomeration.

Returns:

seg : array of int

The segmentation of the volume presently represented by the graph.

is_traversed_by_node(n)[source]¶

Determine whether a body traverses the volume.

This is defined as touching the volume boundary at two distinct locations.

Parameters:

n : int (node id)

The node being inspected.

Returns:

tr : bool

Whether the segment “traverses” the volume being segmented.

learn_agglomerate(gts, feature_map, min_num_samples=1, learn_flat=True, learning_mode='strict', labeling_mode='assignment', priority_mode='active', memory=True, unique=True, random_state=None, max_num_epochs=10, min_num_epochs=2, max_num_samples=inf, classifier='random forest', active_function=<function classifier_probability>, mpf=<function boundary_mean>)[source]¶

Agglomerate while comparing to ground truth & classifying merges.

Parameters:

gts : array of int or list thereof

The ground truth volume(s) corresponding to the current probability map.

feature_map : function (Rag, node, node) -> array of float

The map from node pairs to a feature vector. This must consist either of uncached features or of the cache used when building the graph.

min_num_samples : int, optional

Continue training until this many training examples have been collected.

learn_flat : bool, optional

Do a flat learning on the static graph with no agglomeration.

learning_mode : {‘strict’, ‘loose’}, optional

In ‘strict’ mode, if a “don’t merge” edge is encountered, it is added to the training set but the merge is not executed. In ‘loose’ mode, the merge is allowed to proceed.

labeling_mode : {‘assignment’, ‘vi-sign’, ‘rand-sign’}, optional

How to decide whether two nodes should be merged based on the ground truth segmentations. 'assignment' means the nodes are assigned to the ground truth node with which they share the highest overlap. 'vi-sign' means the the VI change of the switch is used (negative is better). 'rand-sign' means the change in Rand index is used (positive is better).

priority_mode : string, optional

One of:

'active': Train a priority function with the data

from previous epochs to obtain the next.

'random': Merge edges at random. 'mixed': Alternate between epochs of 'active'

and 'random'.

'mean': Use the mean boundary value. (In this case,

training is limited to 1 or 2 epochs.)

'custom': Use the function provided by mpf.

memory : bool, optional

Keep the training data from all epochs (rather than just the most recent one).

unique : bool, optional

Remove duplicate feature vectors.

random_state : int, optional

If provided, this parameter is passed to get_classifier to set the random state and allow consistent results across tests.

max_num_epochs : int, optional

Do not train for longer than this (this argument may override the min_num_samples argument).

min_num_epochs : int, optional

Train for no fewer than this number of epochs.

max_num_samples : int, optional

Train for no more than this number of samples.

classifier : string, optional

Any valid classifier descriptor. See gala.classify.get_classifier()

active_function : function (feat. map, classifier) -> function, optional

Use this to create the next priority function after an epoch.

mpf : function (Rag, node, node) -> float

A merge priority function to use when priority_mode is 'custom'.

Returns:

data : list of array

Four arrays containing:

the feature vectors, shape (n_samples, n_features).

the labels, shape (n_samples, 3). A value of -1 means “should merge”, while 1 means “should not merge”. The columns correspond to the three labeling methods: assignment, VI sign, or RI sign.

the VI and RI change of each merge, (n_edges, 2).

the list of merged edges (n_edges, 2).

alldata : list of list of array

A list of lists like data above: one list for each epoch.

See also

Rag

Notes

The gala algorithm [1] uses the default parameters. For the LASH algorithm [2], use:

learning_mode: 'loose'

labeling_mode: 'rand-sign'

memory: False

References

[R1]	Nunez-Iglesias et al, Machine learning of hierarchical clustering to segment 2D and 3D images, PLOS ONE, 2013.

[R2]	Jain et al, Learning to agglomerate superpixel hierarchies, NIPS, 2011.

learn_edge(edge, ctables, assignments, feature_map)[source]¶

Determine whether an edge should be merged based on ground truth.

Parameters:

edge : (int, int) tuple

An edge in the graph.

ctables : list of array

A list of contingency tables determining overlap between the current segmentation and the ground truth.

assignments : list of array

Similar to the contingency tables, but each row is thresholded so each segment corresponds to exactly one ground truth segment.

feature_map : function (Rag, node, node) -> array of float

The map from node pairs to a feature vector.

Returns:

features : 1D array of float

The feature vector for that edge.

labels : 1D array of float, length 3

The labels determining whether the edge should be merged. A value of -1 means “should merge”, while 1 means “should not merge”. The columns correspond to the three labeling methods: assignment, VI sign, or RI sign.

weights : 1D array of float, length 2

The VI and RI change of the merge.

nodes : tuple of int

The given edge.

learn_epoch(ctables, feature_map, learning_mode='permissive', labeling_mode='assignment')[source]¶

Learn the agglomeration process using various strategies.

Parameters:

ctables : array of float or list thereof

One or more contingency tables between own segments and gold standard segmentations

feature_map : function (Rag, node, node) -> array of float

The map from node pairs to a feature vector. This must consist either of uncached features or of the cache used when building the graph.

learning_mode : {‘strict’, ‘permissive’}, optional

If 'strict', don’t proceed with a merge when it goes against the ground truth. For historical reasons, ‘loose’ is allowed as a synonym for ‘strict’.

labeling_mode : {‘assignment’, ‘vi-sign’, ‘rand-sign’}, optional

Which label to use for learning_mode. Note that all labels are saved in the end.

Returns:

data : list of array

Four arrays containing:

the feature vectors, shape (n_samples, n_features).

the labels, shape (n_samples, 3). A value of -1 means “should merge”, while 1 means “should not merge”. The columns correspond to the three labeling methods: assignment, VI sign, or RI sign.

the VI and RI change of each merge, (n_edges, 2).

the list of merged edges (n_edges, 2).

learn_flat(gts, feature_map)[source]¶

Learn all edges on the graph, but don’t agglomerate.

Parameters:

gts : array of int or list thereof

The ground truth volume(s) corresponding to the current probability map.

feature_map : function (Rag, node, node) -> array of float

The map from node pairs to a feature vector. This must consist either of uncached features or of the cache used when building the graph.

Returns:

data : list of array

Four arrays containing:

the feature vectors, shape (n_samples, n_features).

the labels, shape (n_samples, 3). A value of -1 means “should merge”, while 1 means “should not merge”. The columns correspond to the three labeling methods: assignment, VI sign, or RI sign.

the VI and RI change of each merge, (n_edges, 2).

the list of merged edges (n_edges, 2).

See also

learn_agglomerate

merge_edge_properties(src, dst)[source]¶

Merge the properties of edge src into edge dst.

Parameters:

src, dst : (int, int)

Edges being merged.

Returns:

None

merge_nodes(n1, n2, merge_priority=0.0)[source]¶

Merge two nodes, while updating the necessary edges.

Parameters:

n1, n2 : int

Nodes determining the edge for which to update the UCM.

merge_priority : float, optional

The merge priority of the merge.

Returns:

node_id : int

The id of the node resulting from the merge.

Notes

Additionally, the RIG (region intersection graph), the contingency matrix to the ground truth (if provided) is updated.

merge_subgraph(subgraph=None, source=None)[source]¶

Merge a (typically) connected set of nodes together.

Parameters:

subgraph : agglo.Rag, networkx.Graph, or list of int (node id)

A subgraph to merge.

source : int (node id), optional

Merge the subgraph to this node.

Returns:

None

ncut(num_clusters=10, kmeans_iters=5, sigma=5100.0, nodes=None, **kwargs)[source]¶

Run normalized cuts on the current set of superpixels. Keyword arguments:

num_clusters – number of clusters to compute kmeans_iters – # iterations to run kmeans when clustering sigma – sigma value when setting up weight matrix

Return value: None

non_traversing_bodies()[source]¶: List bodies that are not orphans and do not traverse the volume.

orphans()[source]¶

List all the nodes that do not touch the volume boundary.

Parameters:

None

Returns:

orphans : list of int (node id)

A list of node ids.

Notes

“Orphans” are not biologically plausible in EM data, so we can flag them with this function for further scrutiny.

raveler_body_annotations(traverse=False)[source]¶: Return JSON-compatible dict formatted for Raveler annotations.

real_edges(*args, **kwargs)[source]¶

Return edges internal to the volume.

The RAG actually includes edges to a “virtual” region that envelops the entire volume. This function returns the list of edges that are internal to the volume.

Parameters:

*args, **kwargs : arbitrary types

Arguments and keyword arguments are passed through to the edges() function of the networkx.Graph class.

Returns:

edge_list : list of tuples

A list of pairs of node IDs, which are typically integers.

See also

real_edges_iter, networkx.Graph.edges

real_edges_iter(*args, **kwargs)[source]¶

Return iterator of edges internal to the volume.

The RAG actually includes edges to a “virtual” region that envelops the entire volume. This function returns the list of edges that are internal to the volume.

Parameters:

*args, **kwargs : arbitrary types

Arguments and keyword arguments are passed through to the edges() function of the networkx.Graph class.

Returns:

edges_iter : iterator of tuples

An iterator over pairs of node IDs, which are typically integers.

rebuild_merge_queue()[source]¶: Build a merge queue from scratch and assign to self.merge_queue.

See also

build_merge_queue

remove_inclusions()[source]¶

Merge any segments fully contained within other segments.

In 3D EM images, inclusions are not biologically plausible, so this function can be used to remove them.

Parameters:	None
Returns:	None

remove_obvious_inclusions()[source]¶: Merge any nodes with only one edge to their neighbors.

rename_node(old, new)[source]¶

Rename node old to new, updating edges and weights.

Parameters:

old : int

The node being renamed.

new : int

The new node id.

replay_merge_history(merge_seq, labels=None, num_errors=1)[source]¶

Agglomerate according to a merge sequence, optionally labeled.

Parameters:

merge_seq : iterable of pair of int

The sequence of node IDs to be merged.

labels : iterable of int in {-1, 0, 1}, optional

A sequence matching merge_seq specifying whether a merge should take place or not. -1 or 0 mean “should merge”, 1 otherwise.

Returns:

n : int

Number of elements consumed from merge_seq

e : (int, int)

Last merge pair observed.

Notes

The merge sequence and labels must be generators if you don’t want to manually keep track of how much has been consumed. The merging continues until num_errors false merges have been encountered, or until the sequence is fully consumed.

set_exclusions(excl)[source]¶

Set an exclusion volume, forbidding certain merges.

Parameters:

excl : array of int

Exclusions work as follows: the volume excl is the same shape as the initial segmentation (see set_watershed), and consists of mostly 0s. Any voxels with the same non-zero label will not be allowed to merge during agglomeration (provided they were not merged in the initial segmentation).

This allows manual separation a priori of difficult-to- -segment regions.

Returns:

None

set_feature_manager(feature_manager)[source]¶

Set the feature manager and ensure feature caches are computed.

Parameters:

feature_manager : features.base.Null object

The feature manager to be used by this RAG.

Returns:

None

set_ground_truth(gt=None)[source]¶

Set the ground truth volume.

This is useful for tracking segmentation accuracy over time.

Parameters:

gt : array of int

A ground truth segmentation of the same volume passed to set_watershed.

Returns:

None

set_orientations(orientation_map, channel_is_oriented)[source]¶

Set the orientation map of the probability image.

Parameters:

orientation_map : array of float

A map of angles of the same shape as the superpixel map.

channel_is_oriented : 1D array-like of bool

A vector having length the number of channels in the probability map.

Returns:

None

set_probabilities(probs=array([], dtype=float64), normalize=False)[source]¶

Set the probabilities attributes of the RAG.

For various reasons, including removing the need for bounds checking when looking for neighboring pixels, the volume of pixel-level probabilities is padded on all faces. In addition, this function adds an attribute probabilities_r, a raveled view of the padded probabilities array for quick access to individual voxels using linear indices.

Parameters:

probs : array

The input probabilities array.

normalize : bool, optional

If True, the values in the array are scaled to be in [0, 1].

Returns:

None

set_watershed(ws=array([], dtype=int64), connectivity=1)[source]¶

Set the initial segmentation volume (watershed).

The initial segmentation is called watershed for historical reasons only.

Parameters:

ws : array of int

The initial segmentation.

connectivity : int in {1, ..., ws.ndim}, optional

The pixel neighborhood.

Returns:

None

split_node(u, n=2, **kwargs)[source]¶

Use normalized cuts [1] to split a node/segment.

Parameters:

u : int (node id)

Which node to split.

n : int, optional

How many segments to split it into.

Returns:

None

References

[R3]	Shi, J., and Malik, J. (2000). Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence.

traversing_bodies()[source]¶: List all bodies that traverse the volume.

update_merge_queue(u, v)[source]¶

Update the merge queue item for edge (u, v). Add new by default.

Parameters:

u, v : int (node id)

Edge being updated.

Returns:

None

write_plaza_json(fout, synapsejson=None, offsetz=0)[source]¶: Write graph to Steve Plaza’s JSON spec.

gala.agglo.approximate_boundary_mean(g, n1, n2)[source]¶

Return the boundary mean as computed by a MomentsFeatureManager.

The feature manager is assumed to have been set up for g at construction.

gala.agglo.best_possible_segmentation(ws, gt)[source]¶: Build the best possible segmentation given a superpixel map.

gala.agglo.compute_local_rand_change(s1, s2, n)[source]¶: Compute change in rand if we merge disjoint sizes s1,s2 in volume n.

gala.agglo.compute_local_vi_change(s1, s2, n)[source]¶: Compute change in VI if we merge disjoint sizes s1,s2 in a volume n.

gala.agglo.compute_true_delta_rand(ctable, n1, n2, n)[source]¶

Compute change in RI obtained by merging rows n1 and n2.

This function assumes ctable is normalized to sum to 1.

gala.agglo.conditional_countdown(seq, start=1, pred=<type 'bool'>)[source]¶

Count down from ‘start’ each time pred(elem) is true for elem in seq.

Used to know how many elements of a sequence remain that satisfy a predicate.

Parameters:

seq : iterable

Any sequence.

start : int, optional

The starting element.

pred : function, type(next(seq)) -> bool

A predicate acting on the elements of seq.

Examples

>>> seq = range(10)
>>> cc = conditional_countdown(seq, start=5, pred=lambda x: x % 2 == 1)
>>> next(cc)
5
>>> next(cc)
4
>>> next(cc)
4
>>> next(cc)
3

gala.agglo.get_edge_coordinates(g, n1, n2, arbitrary=False)[source]¶: Find where in the segmentation the edge (n1, n2) is most visible.