Segmentation methods: state of the art
The detection of lesions in MRI scans generally requires a significant time investment from a skilled physician. A number of automated methods seek to lessen this burden by providing fast, accurate, and repeatable segmentation results. The goal of this project was to improve upon the results of an existing segmentation method [1]. The current system represents a significant investment of time and produces good results for a range of real cases. Unfortunately, the process falls short of the abilities of an expert physician.
A number of general categories exist for automated segmentation of MS lesions in MRI scans of the brain. The methods can be divided based on the approach and grouped based on their implementation [2]. There are three main types of segmentation approaches: manual, semi‐automatic, and automatic. Manual segmentation is the base method for lesion segmentation. An expert physician examines different modalities to select the lesion voxels. Unfortunately, the manual process is time consuming and somewhat subjective. Different experts can report different results and the same expert can provide different results for the same data on subsequent evaluations. Even so, manual segmentations are considered the best results available and serve as the baseline for evaluating other methods. The expert segmentations can be considered as a ‘silver standard’ since they are not perfect representations of the ground truths but provide the best in vivo estimates available. Computer‐aided methods do provide some benefit to MS lesion segmentation. Where experts can have difficulty combining information from multiple MRI modalities and from multiple adjacent slices, well‐designed algorithms can efficiently blend this data. As a result, it is interesting to pursue the development of semi‐automated and automated lesion segmentation methods.
Semi‐automatic methods require some human input as the starting point for an automated processing step. This information could be a region of interest or a coarse selection of lesion voxels. While semi‐automatic methods can relieve some of the work from physicians, they do require some input. The input can be as simple as a region of interest, with suspected lesions. The user provides a rectangle around suspected lesions to narrow the focus of the algorithm. The required input might be as detailed as a coarse painting of lesion and non‐lesion tissues. An algorithm could then use information based on the appearance and features of these selections to grow the two regions without any other knowledge. In any case, the automated portion of the segmentation is sensitive to the quality of the input. Because they require some level of user input, the semi‐automatic methods may be unsuitable for large patient studies. Fully automatic methods require no human interaction and can be grouped, at several levels, based on the method used to perform the segmentation. In general, there are three main types of fully automated segmentation schemes: data‐driven methods, intelligent methods, and statistical methods. The data‐driven methods use thresholding and region growing to segment the lesions in an image, like the watershed and grow‐cut methods. The learning‐based methods require a training set and some feature extraction. These methods learn the characteristics of lesions and then classify based on fuzzy rules or decision forests. The statistical methods involve estimations of probability density functions. These methods are based on inference methods with some neighborhood or classification examples and include Markov models and support vector machines. All have advantages and disadvantages in their use and the results they provide (see [3] for more details).
Graph cuts (GC) is a method for finding the maximum a posteriori (MAP) estimate of a binary image [4]. The method treats the image like a flow graph with two nodes, the ‘source’ and the ‘sink’. The source represents the object class in the image, in this case the lesions. The sink represents the background, the non‐lesion tissue. The other nodes of the graph are the image voxel. A network of weighted and directed edges connect the nodes in the graph. The GC makes use of regional and voxel‐neighborhood information to differentiate between the two classes.
The MAP estimate corresponds to the maximum flow through the node network. Essentially, the method removes the inter‐label connections in favor of intra‐label connections. The result is two sets of strongly connected nodes that correspond to the fore‐ and background image elements.
The fuzzy c‐mean seeks to cluster pixels into a number of groups that maximize inter‐cluster variability while minimizing intra‐cluster variability [5, 6]. Rather than a crisp or hard classification, the fuzzy approach specifies the degree to which a pixel belongs to a given cluster. In this way, a pixel can belong to more than one cluster with some degree of probability. Manual segmentation results varying from expert to expert and for repeated evaluations by the same expert can be taken into account in this way.
The mean‐shift is an unsupervised non‐parametric clustering algorithm for image segmentation [7]. The main idea of the mean‐shift algorithm is to treat image points as vectors in a probability density function. The dense regions in this space represent the local maxima of some underlying distribution. The method performs a gradient ascent optimization at each image point until convergence. The mean‐shift vector gradually decreases in length as it approaches the maximum. The resulting points are the modes of the distribution. Nearby data points, within some window size, are considered members of the same cluster. The clustering process depends on the selection of a kernel (local neighborhood) and the specification of a window size and not some prior specification of the number of clusters. The correct selection of a window size is the key for obtaining good results. If the window size is too large, the image will be under‐segmented and regions will be lumped together. This can remove the fine details of small structures like MS lesions. If the window size is too small, a significant amount of over‐segmentation can occur.
The k‐nearest neighbor (k‐NN) is a learning‐based approach that attempts to classify voxels based on the consensus of nearby examples [8]. A number of features can be extracted for a voxel, including its appearance, location in the brain, and relation to its neighbors. A labeled training set provides examples in feature space against which a test voxel is compared. The advantage of this approach is that it needs to only locally estimate the probability densities. The classification is based on the agreement of test examples with similar features in some small neighborhood of examples. The method requires and depends on good examples for good classification. Because MS lesions vary in size, shape, and appearance, they will have widely different feature sets. Without a sufficient number of examples, it could be difficult to correctly classify lesions. Even in patients with MS lesions, the actual number of lesion voxels may be far less than the number of voxels representing healthy tissue, perhaps one in a thousand. As with other learning‐based method, the large fraction of non‐lesion voxels can bias the examples and hurt the lesion detection rate.
Support vector machines (SVM) is a popular and widely used supervised learning algorithm and has been applied to the MS lesion segmentation problem [9, 10]. The method extracts some features from examples of lesion and non‐lesion voxels. It then attempts to divide the two classes by a hyper‐plane in the feature space. While there are many possible dividing planes, the method seeks the plane with the widest margin. Some methods can employ kernels to re‐map the feature space and allow for a non‐linear division of the classes. One problem with the SVM approach in MS lesion detection is the imbalance between class representations. In general, the number of voxels that represent normal brain tissue far exceeds the number of voxels that represent MS lesions. This can lead to the over representation of non‐lesions in the training. Unfortunately, it is difficult to just exclude non‐lesion examples, since any given example might represent important information.
Statistical models generally focus on some estimation of the probability of a lesion based on some mixture‐models for normal tissue [11, 12]. Generally, normal brain tissue is divided into three classes, WM, GM, and CSF. Lesions are generally treated as an outlier to the normal tissue, although in some cases they can be treated as a separate class. The statistical methods try to assign a classification based on the likelihood that a given voxel is a lesion based on these models. The methods include neighborhood information through Markov random field (MRF) or conditional random field (CRF). In these cases, the nearby voxels contribute to the classification. These methods usually include some probability parameter or threshold, beyond which an outlier is considered a lesion. The existing method uses a hidden Markov chain (HMC) to incorporate neighborhood information into the segmentation process. The main drawback is that we segment the whole brain whereas the physician works locally. To be more efficient, we propose in this paper to combine local and global approaches.
Algorithm evaluation
To evaluate the effectiveness of any improvements, it was necessary to specify a quantitative metric for evaluating the progress of any proposed solutions. A number of metrics exist for comparing the computerized and expert results. The metric of interest for these comparisons is the similarity‐index (SI).
The SI represents the amount of overlap in the identification regions provided by the experts and the method of interest. It is computed as the ratio of twice the area of intersection of the regions to the sum of the areas of the regions. It reflects the relative number of correctly segmented voxels to the false‐positives and false‐negatives in a single metric.
\text{SI}=\frac{2(A\cap B)}{A+B}
(1)
Values for the SI will fall between 1 and 0, with values closer to 1.0 representing better results and closer to 0.0 being the worse results. The goal is to improve the segmentation approach in a way that would better match the physician’s segmentations as measured by the level of correspondence between the results and penalized by the difference. As the automated method is intended to assist the physicians in identifying lesions, under‐segmentation or false‐negatives were a more important concern than over‐segmentation or false‐positives. Physicians would have to search the whole brain for any missed lesions, but could more easily reject the incorrectly identified lesions. It was expected that the improvements of SI scores could include an increase in the number of true‐positives as well as an increase both false‐positives and false‐negatives. The addition of false‐negatives would be slightly more troubling than the addition of false‐positives, but an increase in the detection of true‐positives would supersede the two. Beyond the overall scores, it was interesting to consider a per‐lesion evaluation of the efficacy of a given system. By comparing the labeled expert’s results to a set of automated results, it was possible to evaluate over‐ and under‐segmentation for each lesion. The per‐lesion results provided a quantitative way to evaluate the power of a given method to discriminate the boundaries of lesions as well as its propensity for detecting spurious lesions.
Our development uses hidden Markov chain model [1, 13] as a starting point because this approach obtained a very good score at grand challenge: 3D segmentation in the clinic in MICCAI’08. This Markovian method computes lesion segmentations that generally agree with expert results but under‐estimates lesions in some places and over‐estimates in others. After some investigation, we propose a two‐stage process (global on the whole brain then local on reduced areas) for improving the segmentation results. The two‐stage method proposed a change to the way the images were processed and then applied a post‐processing step to grow the lesions. A discussion of the theoretical basis for the existing system and proposed improvements is provided in the Section ‘New local detection model.’ The specific implementation details are presented in Section ‘Proposed algorithm,’ whereas Section ‘Results’ will validate the approach for a set of patients. A conclusion ends this paper, with some observations and potential avenues for future work.
New local detection model
The ‘GrowCut’ approach to image segmentation using a Cellular Automaton seemed to be a good match for the local‐region‐growing idea to improve segmentation results [13]. Cellular Automata (CA) are discrete models both in space and time that govern the evolution over time of a grid of cells. The cells exist in some finite set of states and have simple deterministic rules that govern the change of these states at each time step. In the case of image segmentation, it is possible to construct an update rule to grow regions given some example seeds. The method is generally applied to semi‐automatic segmentation, with a user providing the seed points for the fore‐ and background. Since the existing method provides a base segmentation, these results could be adapted to provide seed points in a fully automatic way. Figure 2 shows a schematic representation of the way the method progresses, from seed selection through growth iterations to the final convergence.
The CA method operates on a set of voxels V in the MRI image. These voxels become cells p in the lattice of the automaton. This relationship between voxels in the 3D image and the corresponding cells is summarized as p\in V\subseteq {\mathbb{Z}}^{3}.
The cellular automaton A is composed of the triplet A=(S,N,δ). S represents the non‐empty set of states within the automaton. N represents the neighborhood of points, in this case the, six‐neighborhood in three dimensions around a given cell. The transition rule, δ, is the function which updates the states of the cells at each time step t. The state of a given voxel, S_{V}, is composed of three pieces of information and given by \left({l}_{\mathrm{V}},{\theta}_{\mathrm{V}},{\overrightarrow{C}}_{\mathrm{V}}\right). The label of the current cell is given by l_{V} and can have the integer values [−1,1]. The values −1,0,and 1 correspond to the labels unknown, WM, and lesion, respectively. The unknown label represents those cells which have yet been assigned a value or that are specifically excluded from the analysis because, for example, they belong to a different tissue class. The strength of the current label is given by θ_{V} with values given by θ_{V}∈[0,1]. The feature vector \overrightarrow{C} represents the properties of each voxel, in this case, a scalar intensity.
Region‐growing algorithm
The following are the steps for region growing:

1.
Select some seed points as representatives of the WM and lesions and assign them the corresponding labels, strengths of 1.0, and their respective intensities.

2.
Assign all other points an initial label, strength in the range [0,1], and their respective intensities.

3.
Assign the non‐WM and non‐lesion points the unknown label and strengths of 1.0. These points, identified as GM and CSF by the atlas, are ignored in the iterations.

4.
Iterate over the points in the image using the evolution rule presented above to update the strengths and labels of the cells.

5.
Terminate the iteration when there are no additional changes (no label changes) or after some maximum number of iterations.
In the first three steps, the values for each cell are initialized to a suitable value. Seed points are normally selected by the user, but in a fully automated system the main processing step provides these points. For these seed points, the lesion label is applied and the corresponding label strength is set to the maximum value of 1.0. In a similar way, points can be set as anti‐seeds: those cells that could not be part of the lesion class because they have very low intensities. These anti‐seed points are given then non‐lesion label with the maximum label strength of 1.0. Some cells represent other tissue types and must be ignored by the region‐growing method. These cells are excluded by giving them maximum strengths and a label that indicates they should not be considered by the algorithm. The remaining cells represent viable growth regions. The algorithm will attempt to consume them for one label or the other. They must be given initial strength values in the range [0,1]. With the cells of the automaton initialized, the iteration process of step 4 begins. The new values at time t+1 are determined based on an evolution rule for the cells. The states for all cells in p at time t are given by S_{
p
}^{t}. These values are updated to the next time‐step t+1 and given by S_{
p
}^{t+1}. In this way, the label at each time step, l_{V}^{t} and its strength θ_{V}^{t} are updated to for each cell to l_{V}^{t+1} and θ_{V}^{t+1}, respectively.
Evolution rule for cellular automaton
As mentioned above, the states of the cells in the Cellular Automaton evolve over time. Through the application of the evolution rule, the cells move through a number of intermediate states to a final stable set of states. The evolution rule describes how the states are updated at each time step. At each step, the evolution rule iterates over the cells p in the image. The cell label, l_{
v
}, and the strength of that label, θ_{
v
}, are copied from the current time step t to the next time step t+1. The neighbors around the cell of interest are then considered in turn. Each neighbor is compared to the cell of interest and attacks the cell, attempting to modify its label and strength. The attack strength for the comparison and the subsequent update in strength depend on a function g(x). This function can take a number of forms but must be monotonic decreasing and restricted to the range [0, 1]. The monotonic decreasing requirement ensures that there are no local extrema in the function that will cause repeated skirmishes. The decreasing requirement also ensures that larger input differences result in smaller strength attacks. This is a necessary behavior, since only neighbors with some close resemblance should have a strong influence on a given cell. Restricting the range to [0, 1] ensures that the strength values will always be bounded and will not increase unexpectedly or uncontrollably. As seen in the algorithm, the input value x is determined at each point by the L2 norm of the difference of the feature vectors C. In the simplest case, the feature vectors are simply the voxel intensity values (Figure 3), but it is easy to imagine this method applied to multiple modalities taken together to form a collective feature vector. This evolution algorithm is then applied at each time step in the overall method. The process of updating continues until convergence or some pre‐determined number of iterations.
Algorithm 1 Evolution rule for cellular automaton