gala.agglo: RAG Agglomeration¶
-
class
gala.agglo.
Rag
(watershed=array([], dtype=int64), probabilities=array([], dtype=float64), merge_priority_function=<function boundary_mean>, gt_vol=None, feature_manager=<gala.features.base.Null object>, mask=None, show_progress=False, connectivity=1, channel_is_oriented=None, orientation_map=array([], dtype=float64), normalize_probabilities=False, exclusions=array([], dtype=float64), isfrozennode=None, isfrozenedge=None)[source]¶ Region adjacency graph for segmentation of nD volumes.
Parameters: watershed : array of int, shape (M, N, ..., P)
The labeled regions of the image. Note: this is called watershed for historical reasons, but could refer to a superpixel map of any origin.
probabilities : array of float, shape (M, N, ..., P[, Q])
The probability of each pixel of belonging to a particular class. Typically, this has the same shape as watershed and represents the probability that the pixel is part of a region boundary, but it can also have an additional dimension for probabilities of belonging to other classes, such as mitochondria (in biological images) or specific textures (in natural images).
merge_priority_function : callable function, optional
This function must take exactly three arguments as input (a Rag object and two node IDs) and return a single float.
feature_manager :
features.base.Null
object, optionalA feature manager object that controls feature computation and feature caching.
mask : array of bool, shape (M, N, ..., P)
A mask of the same shape as watershed, True in the positions to be processed when making a RAG, False in the positions to ignore.
show_progress : bool, optional
Whether to display an ASCII progress bar during long- -running graph operations.
connectivity : int in {1, ..., watershed.ndim}
When determining adjacency, allow neighbors along connectivity dimensions.
channel_is_oriented : array-like of bool, shape (Q,), optional
For multi-channel images, some channels, for example some edge detectors, have a specific orientation. In conjunction with the orientation_map argument, specify which channels have an orientation associated with them.
orientation_map : array-like of float, shape (Q,)
Specify the orientation of the corresponding channel. (2D images only)
normalize_probabilities : bool, optional
Divide the input probabilities by their maximum to ensure a range in [0, 1].
exclusions : array-like of int, shape (M, N, ..., P), optional
Volume of same shape as watershed. Mark points in the volume with the same label (>0) to prevent them from being merged during agglomeration. For example, if exclusions[45, 92] == exclusions[51, 105] == 1, then segments watershed[45, 92] and watershed[51, 105] will never be merged, regardless of the merge priority function.
isfrozennode : function, optional
Function taking in a Rag object and a node id and returning a bool. If the function returns
True
, the node will not be merged, regardless of the merge priority function.isfrozenedge : function, optional
As isfrozennode, but the function should take the graph and two nodes, to specify an edge that cannot be merged.
Attributes
Methods
-
agglomerate
(threshold=0.5, save_history=False)[source]¶ Merge nodes hierarchically until given edge confidence threshold.
This is the main workhorse of the
agglo
module!Parameters: threshold : float, optional
The edge priority at which to stop merging.
save_history : bool, optional
Whether to save and return a history of all the merges made.
Returns: history : list of tuple of int, optional
The ordered history of node pairs merged.
scores : list of float, optional
The list of merge scores corresponding to the history.
evaluation : list of tuple, optional
The split VI after each merge. This is only meaningful if a ground truth volume was provided at build time.
Notes
This function returns
None
when save_history isFalse
.
-
agglomerate_count
(stepsize=100, save_history=False)[source]¶ Agglomerate until ‘stepsize’ merges have been made.
This function is like
agglomerate
, but rather than to a certain threshold, a certain number of merges are made, regardless of threshold.Parameters: stepsize : int, optional
The number of merges to make.
save_history : bool, optional
Whether to save and return a history of all the merges made.
Returns: history : list of tuple of int, optional
The ordered history of node pairs merged.
scores : list of float, optional
The list of merge scores corresponding to the history.
evaluation : list of tuple, optional
The split VI after each merge. This is only meaningful if a ground truth volume was provided at build time.
See also
Notes
This function returns
None
when save_history isFalse
.
-
agglomerate_ladder
(min_size=1000, strictness=2)[source]¶ Merge sequentially all nodes smaller than min_size.
Parameters: min_size : int, optional
The smallest allowable segment after ladder completion.
strictness : {1, 2, 3}, optional
strictness == 1: all nodes smaller than min_size are merged according to the merge priority function. strictness == 2: in addition to 1, small nodes can only be merged to big nodes. strictness == 3: in addition to 2, nodes sharing less than one pixel of boundary are not agglomerated.
Returns: None
Notes
Nodes that are on the volume boundary are not agglomerated.
-
build_boundary_map
(ebunch=None)[source]¶ Return a map of the current merge priority.
Parameters: ebunch : iterable of (int, int), optional
The list of edges for which to build a map. Use all edges if not provided.
Returns: bm : array of float
The image of the edge weights.
-
build_graph_from_watershed
(idxs=None)[source]¶ Build the graph object from the region labels.
The region labels should have been set ahead of time using
set_watershed()
.Parameters: idxs : array-like of int, optional
Linear indices into raveled volume array. If provided, the graph is built only for these indices.
-
build_merge_queue
()[source]¶ Build a queue of node pairs to be merged in a specific priority.
Parameters: None
Returns: mq : MergeQueue object
A MergeQueue is a Python
deque
with a specific element structure: a list of length 4 containing:- the merge priority (any ordered type)
- a ‘valid’ flag
- and the two nodes in arbitrary order
The valid flag allows one to “remove” elements from the queue in O(1) time by setting the flag to
False
. Then, one checks the flag when popping elements and ignores those marked as invalid.One other specific feature is that there are back-links from edges to their corresponding queue items so that when nodes are merged, affected edges can be invalidated and reinserted in the queue with a new priority.
-
build_volume
(nbunch=None)[source]¶ Return the segmentation induced by the graph.
Parameters: nbunch : iterable of int (node id), optional
A list of nodes for which to build the volume. All nodes are used if this is not provided.
Returns: seg : array of int
The segmentation implied by the graph.
Notes
This function is very similar to
get_segmentation
, but it builds the segmentation from the bottom up, rather than using the currently-stored segmentation.
-
cluster_by_labels
(labels, nodes=None)[source]¶ Merge all superpixels with the same label (1 label per 1 sp)
-
compute_W
(merge_priority_function, sigma=5100.0, nodes=None)[source]¶ Computes the weight matrix for clustering
-
compute_feature_caches
()[source]¶ Use the feature manager to compute node and edge feature caches.
Parameters: None Returns: None
-
compute_orphans
()[source]¶ Find all the segments that do not touch the volume boundary.
Parameters: None
Returns: orphans : list of int (node id)
A list of node ids.
Notes
This function differs from
orphans
in that it does not use the graph, but rather computes orphans directly from the segmentation.
-
get_edge_coordinates
(n1, n2, arbitrary=False)[source]¶ Find where in the segmentation the edge (n1, n2) is most visible.
-
get_segmentation
(threshold=None)[source]¶ Return the unpadded segmentation represented by the graph.
Remember that the segmentation volume is padded with an “artificial” segment that envelops the volume. This function simply removes the wrapping and returns a segmented volume.
Parameters: threshold : float, optional
Get the segmentation at the given threshold. If no threshold is given, return the segmentation at the current level of agglomeration.
Returns: seg : array of int
The segmentation of the volume presently represented by the graph.
-
is_traversed_by_node
(n)[source]¶ Determine whether a body traverses the volume.
This is defined as touching the volume boundary at two distinct locations.
Parameters: n : int (node id)
The node being inspected.
Returns: tr : bool
Whether the segment “traverses” the volume being segmented.
-
learn_agglomerate
(gts, feature_map, min_num_samples=1, learn_flat=True, learning_mode='strict', labeling_mode='assignment', priority_mode='active', memory=True, unique=True, random_state=None, max_num_epochs=10, min_num_epochs=2, max_num_samples=inf, classifier='random forest', active_function=<function classifier_probability>, mpf=<function boundary_mean>)[source]¶ Agglomerate while comparing to ground truth & classifying merges.
Parameters: gts : array of int or list thereof
The ground truth volume(s) corresponding to the current probability map.
feature_map : function (Rag, node, node) -> array of float
The map from node pairs to a feature vector. This must consist either of uncached features or of the cache used when building the graph.
min_num_samples : int, optional
Continue training until this many training examples have been collected.
learn_flat : bool, optional
Do a flat learning on the static graph with no agglomeration.
learning_mode : {‘strict’, ‘loose’}, optional
In ‘strict’ mode, if a “don’t merge” edge is encountered, it is added to the training set but the merge is not executed. In ‘loose’ mode, the merge is allowed to proceed.
labeling_mode : {‘assignment’, ‘vi-sign’, ‘rand-sign’}, optional
How to decide whether two nodes should be merged based on the ground truth segmentations.
'assignment'
means the nodes are assigned to the ground truth node with which they share the highest overlap.'vi-sign'
means the the VI change of the switch is used (negative is better).'rand-sign'
means the change in Rand index is used (positive is better).priority_mode : string, optional
- One of:
'active'
: Train a priority function with the datafrom previous epochs to obtain the next.
'random'
: Merge edges at random.'mixed'
: Alternate between epochs of'active'
and
'random'
.'mean'
: Use the mean boundary value. (In this case,training is limited to 1 or 2 epochs.)
'custom'
: Use the function provided by mpf.
memory : bool, optional
Keep the training data from all epochs (rather than just the most recent one).
unique : bool, optional
Remove duplicate feature vectors.
random_state : int, optional
If provided, this parameter is passed to get_classifier to set the random state and allow consistent results across tests.
max_num_epochs : int, optional
Do not train for longer than this (this argument may override the min_num_samples argument).
min_num_epochs : int, optional
Train for no fewer than this number of epochs.
max_num_samples : int, optional
Train for no more than this number of samples.
classifier : string, optional
Any valid classifier descriptor. See
gala.classify.get_classifier()
active_function : function (feat. map, classifier) -> function, optional
Use this to create the next priority function after an epoch.
mpf : function (Rag, node, node) -> float
A merge priority function to use when
priority_mode
is'custom'
.Returns: data : list of array
- Four arrays containing:
- the feature vectors, shape
(n_samples, n_features)
. - the labels, shape
(n_samples, 3)
. A value of -1 means “should merge”, while 1 means “should not merge”. The columns correspond to the three labeling methods: assignment, VI sign, or RI sign. - the VI and RI change of each merge,
(n_edges, 2)
. - the list of merged edges
(n_edges, 2)
.
- the feature vectors, shape
alldata : list of list of array
A list of lists like data above: one list for each epoch.
See also
Notes
The gala algorithm [1] uses the default parameters. For the LASH algorithm [2], use:
- learning_mode:
'loose'
- labeling_mode:
'rand-sign'
- memory:
False
References
[R1] Nunez-Iglesias et al, Machine learning of hierarchical clustering to segment 2D and 3D images, PLOS ONE, 2013. [R2] Jain et al, Learning to agglomerate superpixel hierarchies, NIPS, 2011.
-
learn_edge
(edge, ctables, assignments, feature_map)[source]¶ Determine whether an edge should be merged based on ground truth.
Parameters: edge : (int, int) tuple
An edge in the graph.
ctables : list of array
A list of contingency tables determining overlap between the current segmentation and the ground truth.
assignments : list of array
Similar to the contingency tables, but each row is thresholded so each segment corresponds to exactly one ground truth segment.
feature_map : function (Rag, node, node) -> array of float
The map from node pairs to a feature vector.
Returns: features : 1D array of float
The feature vector for that edge.
labels : 1D array of float, length 3
The labels determining whether the edge should be merged. A value of -1 means “should merge”, while 1 means “should not merge”. The columns correspond to the three labeling methods: assignment, VI sign, or RI sign.
weights : 1D array of float, length 2
The VI and RI change of the merge.
nodes : tuple of int
The given edge.
-
learn_epoch
(ctables, feature_map, learning_mode='permissive', labeling_mode='assignment')[source]¶ Learn the agglomeration process using various strategies.
Parameters: ctables : array of float or list thereof
One or more contingency tables between own segments and gold standard segmentations
feature_map : function (Rag, node, node) -> array of float
The map from node pairs to a feature vector. This must consist either of uncached features or of the cache used when building the graph.
learning_mode : {‘strict’, ‘permissive’}, optional
If
'strict'
, don’t proceed with a merge when it goes against the ground truth. For historical reasons, ‘loose’ is allowed as a synonym for ‘strict’.labeling_mode : {‘assignment’, ‘vi-sign’, ‘rand-sign’}, optional
Which label to use for learning_mode. Note that all labels are saved in the end.
Returns: data : list of array
- Four arrays containing:
- the feature vectors, shape
(n_samples, n_features)
. - the labels, shape
(n_samples, 3)
. A value of -1 means “should merge”, while 1 means “should not merge”. The columns correspond to the three labeling methods: assignment, VI sign, or RI sign. - the VI and RI change of each merge,
(n_edges, 2)
. - the list of merged edges
(n_edges, 2)
.
- the feature vectors, shape
-
learn_flat
(gts, feature_map)[source]¶ Learn all edges on the graph, but don’t agglomerate.
Parameters: gts : array of int or list thereof
The ground truth volume(s) corresponding to the current probability map.
feature_map : function (Rag, node, node) -> array of float
The map from node pairs to a feature vector. This must consist either of uncached features or of the cache used when building the graph.
Returns: data : list of array
- Four arrays containing:
- the feature vectors, shape
(n_samples, n_features)
. - the labels, shape
(n_samples, 3)
. A value of -1 means “should merge”, while 1 means “should not merge”. The columns correspond to the three labeling methods: assignment, VI sign, or RI sign. - the VI and RI change of each merge,
(n_edges, 2)
. - the list of merged edges
(n_edges, 2)
.
- the feature vectors, shape
See also
-
merge_edge_properties
(src, dst)[source]¶ Merge the properties of edge src into edge dst.
Parameters: src, dst : (int, int)
Edges being merged.
Returns: None
-
merge_nodes
(n1, n2, merge_priority=0.0)[source]¶ Merge two nodes, while updating the necessary edges.
Parameters: n1, n2 : int
Nodes determining the edge for which to update the UCM.
merge_priority : float, optional
The merge priority of the merge.
Returns: node_id : int
The id of the node resulting from the merge.
Notes
Additionally, the RIG (region intersection graph), the contingency matrix to the ground truth (if provided) is updated.
-
merge_subgraph
(subgraph=None, source=None)[source]¶ Merge a (typically) connected set of nodes together.
Parameters: subgraph : agglo.Rag, networkx.Graph, or list of int (node id)
A subgraph to merge.
source : int (node id), optional
Merge the subgraph to this node.
Returns: None
-
ncut
(num_clusters=10, kmeans_iters=5, sigma=5100.0, nodes=None, **kwargs)[source]¶ Run normalized cuts on the current set of superpixels. Keyword arguments:
num_clusters – number of clusters to compute kmeans_iters – # iterations to run kmeans when clustering sigma – sigma value when setting up weight matrixReturn value: None
-
orphans
()[source]¶ List all the nodes that do not touch the volume boundary.
Parameters: None
Returns: orphans : list of int (node id)
A list of node ids.
Notes
“Orphans” are not biologically plausible in EM data, so we can flag them with this function for further scrutiny.
-
raveler_body_annotations
(traverse=False)[source]¶ Return JSON-compatible dict formatted for Raveler annotations.
-
real_edges
(*args, **kwargs)[source]¶ Return edges internal to the volume.
The RAG actually includes edges to a “virtual” region that envelops the entire volume. This function returns the list of edges that are internal to the volume.
Parameters: *args, **kwargs : arbitrary types
Arguments and keyword arguments are passed through to the
edges()
function of thenetworkx.Graph
class.Returns: edge_list : list of tuples
A list of pairs of node IDs, which are typically integers.
See also
real_edges_iter
,networkx.Graph.edges
-
real_edges_iter
(*args, **kwargs)[source]¶ Return iterator of edges internal to the volume.
The RAG actually includes edges to a “virtual” region that envelops the entire volume. This function returns the list of edges that are internal to the volume.
Parameters: *args, **kwargs : arbitrary types
Arguments and keyword arguments are passed through to the
edges()
function of thenetworkx.Graph
class.Returns: edges_iter : iterator of tuples
An iterator over pairs of node IDs, which are typically integers.
-
rebuild_merge_queue
()[source]¶ Build a merge queue from scratch and assign to self.merge_queue.
See also
-
remove_inclusions
()[source]¶ Merge any segments fully contained within other segments.
In 3D EM images, inclusions are not biologically plausible, so this function can be used to remove them.
Parameters: None Returns: None
-
rename_node
(old, new)[source]¶ Rename node old to new, updating edges and weights.
Parameters: old : int
The node being renamed.
new : int
The new node id.
-
replay_merge_history
(merge_seq, labels=None, num_errors=1)[source]¶ Agglomerate according to a merge sequence, optionally labeled.
Parameters: merge_seq : iterable of pair of int
The sequence of node IDs to be merged.
labels : iterable of int in {-1, 0, 1}, optional
A sequence matching merge_seq specifying whether a merge should take place or not. -1 or 0 mean “should merge”, 1 otherwise.
Returns: n : int
Number of elements consumed from merge_seq
e : (int, int)
Last merge pair observed.
Notes
The merge sequence and labels must be generators if you don’t want to manually keep track of how much has been consumed. The merging continues until num_errors false merges have been encountered, or until the sequence is fully consumed.
-
set_exclusions
(excl)[source]¶ Set an exclusion volume, forbidding certain merges.
Parameters: excl : array of int
Exclusions work as follows: the volume excl is the same shape as the initial segmentation (see
set_watershed
), and consists of mostly 0s. Any voxels with the same non-zero label will not be allowed to merge during agglomeration (provided they were not merged in the initial segmentation).This allows manual separation a priori of difficult-to- -segment regions.
Returns: None
-
set_feature_manager
(feature_manager)[source]¶ Set the feature manager and ensure feature caches are computed.
Parameters: feature_manager :
features.base.Null
objectThe feature manager to be used by this RAG.
Returns: None
-
set_ground_truth
(gt=None)[source]¶ Set the ground truth volume.
This is useful for tracking segmentation accuracy over time.
Parameters: gt : array of int
A ground truth segmentation of the same volume passed to
set_watershed
.Returns: None
-
set_orientations
(orientation_map, channel_is_oriented)[source]¶ Set the orientation map of the probability image.
Parameters: orientation_map : array of float
A map of angles of the same shape as the superpixel map.
channel_is_oriented : 1D array-like of bool
A vector having length the number of channels in the probability map.
Returns: None
-
set_probabilities
(probs=array([], dtype=float64), normalize=False)[source]¶ Set the probabilities attributes of the RAG.
For various reasons, including removing the need for bounds checking when looking for neighboring pixels, the volume of pixel-level probabilities is padded on all faces. In addition, this function adds an attribute probabilities_r, a raveled view of the padded probabilities array for quick access to individual voxels using linear indices.
Parameters: probs : array
The input probabilities array.
normalize : bool, optional
If
True
, the values in the array are scaled to be in [0, 1].Returns: None
-
set_watershed
(ws=array([], dtype=int64), connectivity=1)[source]¶ Set the initial segmentation volume (watershed).
The initial segmentation is called watershed for historical reasons only.
Parameters: ws : array of int
The initial segmentation.
connectivity : int in {1, ..., ws.ndim}, optional
The pixel neighborhood.
Returns: None
-
split_node
(u, n=2, **kwargs)[source]¶ Use normalized cuts [1] to split a node/segment.
Parameters: u : int (node id)
Which node to split.
n : int, optional
How many segments to split it into.
Returns: None
References
[R3] Shi, J., and Malik, J. (2000). Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence.
-
-
gala.agglo.
approximate_boundary_mean
(g, n1, n2)[source]¶ Return the boundary mean as computed by a MomentsFeatureManager.
The feature manager is assumed to have been set up for g at construction.
-
gala.agglo.
best_possible_segmentation
(ws, gt)[source]¶ Build the best possible segmentation given a superpixel map.
-
gala.agglo.
compute_local_rand_change
(s1, s2, n)[source]¶ Compute change in rand if we merge disjoint sizes s1,s2 in volume n.
-
gala.agglo.
compute_local_vi_change
(s1, s2, n)[source]¶ Compute change in VI if we merge disjoint sizes s1,s2 in a volume n.
-
gala.agglo.
compute_true_delta_rand
(ctable, n1, n2, n)[source]¶ Compute change in RI obtained by merging rows n1 and n2.
This function assumes ctable is normalized to sum to 1.
-
gala.agglo.
conditional_countdown
(seq, start=1, pred=<type 'bool'>)[source]¶ Count down from ‘start’ each time pred(elem) is true for elem in seq.
Used to know how many elements of a sequence remain that satisfy a predicate.
Parameters: seq : iterable
Any sequence.
start : int, optional
The starting element.
pred : function, type(next(seq)) -> bool
A predicate acting on the elements of seq.
Examples
>>> seq = range(10) >>> cc = conditional_countdown(seq, start=5, pred=lambda x: x % 2 == 1) >>> next(cc) 5 >>> next(cc) 4 >>> next(cc) 4 >>> next(cc) 3