Название | Machine Learning for Tomographic Imaging |
---|---|
Автор произведения | Professor Ge Wang |
Жанр | Медицина |
Серия | |
Издательство | Медицина |
Год выпуска | 0 |
isbn | 9780750322164 |
Based on PCA, we can apply another component analysis algorithm called zero-phase component analysis, abbreviated as ZCA. ZCA is accomplished by transforming PCA data into the original data space:
XZCAwhite=UXPCAwhite,(1.15)
where U is the unitary matrix with the same definition as the SVD, UU⊤=I (also referred to as the ‘Mahalanobis transformation’). It can be shown that ZCA attempts to keep the transformed data as close to the original data as feasible. Hence, compared to PCA, data whitened by ZCA are more related to the original in terms of preserving structural information, except for luminance and contrast data. Figure 1.10 illustrates the global and local behaviors of PCA and ZCA, respectively. Since natural image features are mostly local, decorrelation or whitening filters can also be local. For natural images, high frequency features are commonly associated with small eigenvalues. The luminance and contrast components take up most of the energy of the image. In this context, ZCA is a simple yet effective way to highlight these structural features by removing the luminance and contrast components that account for little structural information in the image.
Figure 1.10. Basis functions obtained with PCA and ZCA, respectively. (a) PCA whitening basis functions, (b) ZCA whitening basis functions (with size 8 × 8), and (c) an enlarged view of a typical ZCA component in which significant variations happen around a specific spatial location.
In the HVS, the receptive field is tuned to a particular light pattern for a maximum response, which is achieved via local precessing. The receptive field of ganglion cells in the retina is a good example of a local filtering operation, so is the field of view of ganglion and LGN cells.
If the HVS had to transmit each pixel value to the brain separately, it would not be cost-effective. Fortunately, local neural processing yields a less redundant representation of an input image and then transmits the compressed code to the brain. According to the experimental results with natural images, the whitening filters for centralized receptive fields are circularly symmetric and similar to the LOG function, as shown in figure 1.3. Neurobiologists have verified that compared to the millions of photoreceptors in the retina, the numbers of ganglion and LGN cells are quite small, indicating a compression operation is performed on the original data.
1.1.6 Sparse coding
In the previous subsection, we have introduced several models on natural image statistics, which produce results similar to the responses of the HVS retina and LGN. These models only get rid of the first- and second-redundancy in images. Now, we will introduce two models that are the first successful attempts to give similar results to those found in simple cells in the visual cortex. These models suppress higher-order redundancy in images. These models interpret visual perception, the data are pre-processed by whitening and DC is removed, being consistent with the HVS pre-processing step for LGN and ganglion cells.
Although these two models are milestones in mimicking the responses of simple cells to natural images, their computational methods are quite time-consuming. Here, we only focus on the main idea behind these models, and in chapter 4 we will introduce some efficient methods to obtain the same results.
The first model was proposed by Olshausen and Field (Olshausen and Field 1996). They used a one-layer network and trained it with natural image patches to extract distinguished features for natural image coding. According to this study, a simple cell of V1 contains about 200 million cells, while the number of ganglion and LGN cells responsible for visual perception is only just over 1 million. This indicates that sparse coding is an effective strategy for data redundancy reduction and efficient image representation.
Sparse coding means that a given image may be typically described in terms of a small number of suitable basis functions chosen out of a large training dataset. A heavy-tailed distribution of representation coefficients is often observed, as illustrated in figure 1.10. For instance, if we consider a natural image patch as a vector x, as shown in figure 1.11, this vector can be represented just by two components, i.e. numbers 3 and 6 out of the 12 features in total. To generalize this problem, a typical sparse encoding strategy is to approximate an image as a linear combination of basis functions:
x=∑i=1Kαiϕi,(1.16)
where αi is a representation coefficient or an active coefficient for the ith basis function, and ϕi is the ith basis function.
Figure 1.11. An image modeled as a linear superposition of basis functions. The sparse encoding is to learn basis functions which capture the structures efficiently in a specific domain, such as natural images. Adapted from figure 1 in Baraniuk (2007) with permission. Copyright 2007 IEEE.
In their work published in 1996 (Olshausen and Field 1996), they trained a feed-forward artificial neural network on natural image patches in terms of the sparse representation with an over-complete basis. In sparse coding, the search process should achieve a match as closely as possible between the distribution of images described by the linear image model under sparse constraints and the corresponding training targets (figure 1.12). For this purpose, Lagrangian optimization was used to describe this problem. Then, the final optimization function can be formularized as follows:
minα,ϕ∑j=1mxj−∑i=1Kαj,iϕi2+λ∑iSαj,i,(1.17)
where xj is an image patch extracted from a natural image, αj,i is the representation coefficient for basis function ϕi in image patch xj, S is a sparsity measure, and λ is a weighting parameter. This formula contains two components: the first term computes the reconstruction error while the second term imposes the sparsity penalty.
Figure 1.12. Sparse representation characterized by a generalized Gaussian distribution of representation coefficients, which generates sparse coefficients in terms of an over-complete dictionary. (a) An image is represented by a small number of ‘active’ code elements and (b) the probability distribution of its ‘activities’. Lena image © Playboy Enterprises, Inc.
Although this formula is quite simple and easy to comprehend, it has an open question: how does one measure sparseness mathematically? As a reference point, the distribution of a zero-mean random variable can be compared to the Gaussian distribution with the same variance and mean. The rationale for selection of the Gaussian distribution as the reference is that the Gaussian distribution has the largest entrophy relative to all probability distributions for the same variance. Thus, if the distribution of interest is more concentrated than the Gaussian distribution, it can be regarded as being sparse. Based on this consideration, the measurement of sparseness can be heuristically calculated. The criteria for a sparsity function to work as intended are to emphasize values that are close to zero or values that are much larger than a positive constant, such as 1 for a normalized/whitened random variable. A sparse function satisfying these two requirements is often heavy tailed, i.e. many coefficients are insignificant, and significant coefficients are sparse so that a resultant image representation is sparse. Interestingly, if we use S(x)=∣x∣, the coding process is to solve a Lasso problem, which means that the regularization term is in the L1 norm. This explains why we often use the L1 norm for a sparse solution.
By training their network with image patches of 12 by 12 pixels, they obtained 144 basis functions, as shown in figure 1.13. Recall that the patches were whitened before feeding into the network. The basis functions obtained by sparse coding of natural images are Gabor-like, similar to the responses of the receptive fields of simple cells in V1. Hence, these basis functions model the receptive