Image Registration by Nonlinear Wavelet Compression and Singular Value Decomposition

Jorge E. Pinzon1, John F. Pierce2, Susan L. Ustin3 and Claudia M. Castaneda3
1Department of Mathematics University of California at Davis Davis, CA 95616
2KT-Tech, Inc., 9801 Greenbelt Road, Suite 314, Lanham MD 20706
3Department of Land, Air and Water Resources University of California at Davis
Davis, CA 95616
 
Author for Correspondence:
Jorge E. Pinzon
Department of Applied Mathematics
University of California
Davis, CA 95616
Phone:  (530) 752-5092
FAX:  (530) 752-5262
email:  jepinzon@ucdavis.edu

Abstract

Current research in image registration focuses on automatic, fast, accurate, and reliable techniques whose application is feasible to the large amount of remote sensing data being generated by Earth and space missions. Image registration by point matching involves two main steps: the identification of significant features, commonly referred to as control points, and the matching of the features to identify the mapping function that brings one image into alignment with the master (reference) image. Automatically identifying control points is difficult, because (1) points significant in an image ("hot points") need not match with corresponding points in a reference image, and (2) even when matched pairs of points are obtained, there must be a significant number of them suitably distributed spatially to compute accurately a mapping function for the registration.

In this paper, we present a three step algorithm with feedback which automatically registers images differing by a translation, a rotation, and/or a homogeneous scaling (homotheity). First, we identify potential control points as hot points by highly compressing the reference and input images using non-linear wavelet techniques. Second, we use singular decomposition (SVD) methods to compute singular values and principal directions for these ensembles of hot points. Third, we use these data from each ensemble to determine a registering transformation. Feedback between the first and second stage allow the SVD computations to optimize the probability that the ensembles of hot points manifest control points. Feedback among the three stages designed to optimize a correlation function refines the accuracy of the registering transformation.

1. Introduction

We present an algorithm for registering images by global, two-dimensional, rigid transformations (translation, rotation, uniform scaling). The algorithm registers images by extracting from each an ensemble of points, a significant number of which match under the transformation (control points) [1]. It then generates the registering transformation from these ensembles using tools from the theory of singular value decomposition (SVD) for matrices.

The algorithm consists of three stages with feedback. The first stage of the algorithm compresses the reference and input images, extracting hot pixels which manifest features significant to each image. The second stage carries out the SVD computations on each of the compressed images to extract singular values and principal directions for each ensemble. The third stage uses the centroid, singular values, and principal directions for each ensemble to compute the registering transformation.

A stability feedback loop between the second and first stages allows the SVD computations to refine the compression and to maximize the probability that the ensemble of hot pixels in both images manifest viable control points, a significant number of which will match. A feedback loop among the third, second, and first stages refines the accuracy of the registration computed by the third stage.

We designed the algorithm to be computationally simple, fast accurate, reliable and automatic, to the extent that these requirements are compatible. For example, given a choice between speed and accuracy, we designed the algorithm for speed; given a choice between speed and computational simplicity, we designed the algorithm for simplicity. In Section 2, we present the logical foundation for the algorithm. We introduce the elements of the SVD and how they can be used with the ensembles of control points from the reference and input images to assess the registering transformation. We then indicate how the addition of random points which do not match to either ensemble affects the conclusions for the computed registration. It is this analysis which establishes the viability of using this technique computationally for registration purposes.

We then present the grounds by which we use nonlinear compression to extract "hot pixels" which have a high probability of manifesting features significant in each image, and thereby have a significant probability of representing control points which match under the registration. It is this analysis which serves as the basis for the stability feedback loop, whereby the SVD computations optimize the compression. In Section 3, we lay out the elements of the algorithm. In Section 4, we present the results of the validation and performance tests. In Section 5, we present conclusions about the shortcomings of the current version of the algorithm and set forth ways by which we could improve it in future versions.

2. Control Point Extraction and Registration Tools

In this section we present the basis for applying SVD tools for registration purposes. We present also the basis for using wavelet compression and the SVD computations to extract from each image a significant number of candidate control points, sufficiently distributed spatially, so as to produce an accurate registration.
 

2.1 SVD and Registration.

We take a global, rigid transformation acting on any subset in a fixed 2-D Euclidean space to be a transformation of the form
 
    OQ = sRqOP + t,
(1)
where O is a fixed, but arbitrary point in the space, P is a point in the domain subset, Q is the image point of P under the transformation, OP and OQ are the position vectors from O to P and Q, respectively, s is a (non- negative) scalar representing a homogeneous scaling (homotheity), Rq denotes a rotation about the z-axis centered at O, counter-clockwise through the angle q, and t denotes a two-dimensional vector characterizing a translation. Take as given a reference and an input image, which we view as subsets of the Euclidean space. We say the images register under the transformation Equation (1), or the transformation registers the images if the transformation takes the reference image onto the input image.

Consider two ensembles of points P and Q in the reference and input images, respectively. We say the ensembles match under the transformation if points of the first are taken onto the second in a one-to-one manner by the transformation. When this occurs, we can show: (1) that the centroids of the ensembles register under the transformation, (2) that we can isolate the rotation and scaling parameters for the transformation by examining the positions of points in each ensemble relative to their respective centroid, (3) that we can deduce the rotation and scaling parameters from the SVD computations for two matrices built from the components of the position vectors for the points in each ensemble relative to their respective centroids, and (4) once these parameters are computed, we can deduce the translation parameters from the position of the centroid in the input image and the rotated and scaled position of the centroid in the reference image.

Specifically, let P0 and Q0 denote the centroids of the respective ensembles. If the two ensembles match under the transformation, then Equation (1), implies that
 
    OQ0 = sRqOP0 + t, 
 (2)
or that the centroids match under the transformation. Equations (1) and (2) imply that points P and Q in the reference and input images register under transformation Equation (1), provided
 
    Q0Q = sRqP0P.
(3)
Equation (3) asserts that we may isolate the scaling and rotation parameters for the transformation by considering the position of points in each image, relative to the centroid of the respective ensembles.

We now demonstrate that if we arbitrarily order each ensemble, form (with respect to a fixed spatial basis) the m x 2 matrix of coordinates for each point in each (ordered) ensemble, relative to its respective centroid, and compute the singular value decomposition of each matrix: (1) the ratio of the corresponding right singular values for the two matrices, ordered by magnitude, determines the scaling factor s, and (2) the difference in the directions of the corresponding right principal singular vectors determine the rotation angle q for the transformation.

Specifically, let P and Q be two ensembles of points in the reference and input images, respectively, which match under the transformation Equation (1). Order each ensemble arbitrarily. If m denotes the cardinality of each ensemble, Equation (3) implies, for 1 < i < m,
 
    Q0Qp(i) = sRqP0Pi
(4)
Where p denotes the permutation on the index set {1, …, m} which specifies which points in the two ensembles match pairwise under the transformation and the two orderings. We construct a matrix equation representing the system of Equations (4). Let X denote the m x 2 matrix whose rows are the components (with respect to a fixed spatial basis) of the position vectors Xi = P0 Pi to the ordered points in the ensemble in the reference image, relative to its centroid. Let Y denote the m x 2 matrix which is the counterpart built (with respect to the same fixed spatial basis) from the ordered ensemble of points in the input image, relative to its centroid. A straightforward computation shows that the matrix equation counterpart to Equation (4) is
 
    Y = sPXRT0,
(5)
where
    PT P = I.
(6)
Here, P is an m x m matrix which represents the action on the matrix X induced by the permutation p acting on the {Xi}. Equation (6) asserts that it is an orthogonal matrix. Rq is the (orthogonal) 2 x 2 matrix representing the rotation Rq with respect to the fixed spatial basis.

We now demonstrate we can assess, in principle, the rotation and scaling parameters for the transformation from the singular values and principal directions for the matrices associated with the ensembles of points in the reference and input images. For any m x n real matrix X , a right singular value for X is a real, positive number s satisfying
 
    XT Xu = s2 u,
(7)
for some u e Rn; || u || = 1. Term u satisfying (7) a right principal singular direction associated with s.

A computation similar to that leading to Equation (6) shows that the singular values and principal singular directions for a matrix are independent of the ordering of the rows of the matrix. Indeed, for the case where X is the m x 2 matrix associated with the positions of the ensemble of points in the reference image, relative to the centroid, the principal singular directions specify the lines of the semi-major and semi-minor axes of the smallest ellipse centered at the centroid enveloping the ensemble, and the ratio of the singular values specify the eccentricity of the ellipse [3, p. 26]. In other words, the singular values and principal directions characterize geometric qualities of the ensemble which are independent of the ordering of the points.

We can now see how the singular values and principal directions for the matrices associated with the ensembles of points in the reference and input images characterize the rotation and scaling parameters for the transformation.

Theorem 2.1. Let P and Q denote ensembles of points in a reference and input image, respectively, matched under a transformation of the form Equation (1) which registers the images. Order the two ensembles arbitrarily. Let X and Y denote the m x 2 matrices whose rows are the components (with respect to a fixed spatial basis) of the position vectors to the ordered points in the reference and input ensembles, relative to their respective centroids.
 
If 
    XT Xu = s2u,
(8)
then
    YT Yv = t2 v,
 (9)
for 
    v = Rqu, and t = ss,
 (10)
and conversely.

Proof. X and Y satisfy Equation(5), for some orthogonal matrix P representing the action of a suitably chosen permutation on m elements. Let s, u satisfy Equation (8). Then
 
    YTY (Ru) =s2 RXT PT PXRTq(Rqu).
(11)
Equations (9) and (10) follow from Equation (11), and the orthogonal nature of P and Rq.

Equations (9) and (10) assert that the difference in the orientations of the principal directions for the ellipses enveloping the ensembles of points in the reference and input images determines the angle parameter for the registering transformation, and the ratio of the corresponding singular values, ordered by magnitude determines the scaling parameter.

When the ensembles contain points which do not match under registration, there is impact upon the conclusions of Theorem 2.1. However, if the points in the ensembles which do not match under registration are randomly distributed, the SVD computations of the orientation of the principal singular directions and of the singular values are, in the mean, unaffected. Consequently, the impact upon the computations of the rotation and scaling parameters will be, in the mean, minor. Figure 1 demonstrates this assertion empirically. In this figure, we show average variation of the principal direction after 100 computations of ensembles augmented by 20% of randomly generated set of points. In all cases, the variation is less than 1 degree.

Table 1 summarizes one of many computations of scale factor, rotation, and translation parameters for an ensemble of points transformed by the values in parenthesis then augmented 20% by a randomly generated set of points. For the rotation and scale parameters, the results concur with the assertion in Figure 1.

However, the computation of the translation parameters, which is based upon centroid computations is more sensitive to the presence of points in one ensemble not matching under registration to points in the other, as Table 1 shows. The problem becomes acute when the ensembles significantly differ, because of cropping of the input image, for example. We address this concern in the conclusion, Section 5.
 

2.2 Compression and Control Points

We use a nonlinear wavelet compression strategy to extract from the reference and input images "hot pixels". We assume that among them are points which manifest features inherent to each image, and thereby are candidates for control points which would match under the registering transformation. We transform the images using a Fast Wavelet Transform, relative to an orthogonal wavelet basis, in this case, Coiflet. From a histogram of the magnitude of the wavelet coefficients, we establish a filter which maintains a significant interval of those non-zero coefficients which exceed an adaptive thresholding percentage of the maximum magnitude of the coefficients, and nullifies all other coefficients. Reconstructing spatial images from the "softly" thresholded coefficients produces compressed, but "cloudy" versions of the reference and input images. Incrementing the filter interval can reconfigure the "clouds" in these compressed images.

A hard thresholding of these spatial images, using a filter built from the histogram of the pixel gray level values of these images, allows us to "remove penumbra", and produce compressed, hot pixel versions of the reference and input images. Incrementing this filter allows us to add or remove hot pixels from the compressed images.

By itself, this process will not automatically extract ensembles of points which manifest features inherent to the images, and thereby present themselves as candidate control points. However, results of the SVD computations can control the incrementation of the filters in the compression and spatial thresholding steps, and thereby optimize the probability that a significant number of hot pixels in each image do in fact manifest inherent features of the image. This control is the foundation for a stability feedback loop. We base the assertion on the following observation: if you add to an ensemble of points a collection of randomly distributed points, the orientation of principal singular direction associated with the original ensemble will be, in the mean, unaffected. This observation is the basis of how we designed the program to automatically control the compression and thresholding filters.

We assume that hot pixels which manifest features inherent in an image are not randomly distributed, and that hot pixels which manifest artifacts in an image, such as incidental lighting, are. For a given compression, we design the program to increment the thresholding filter, to produce or remove hot pixels from each compressed image. The program computes the rate of change in the orientation of the principal singular direction, with respect to the addition of hot pixels. If the rate of change is sufficiently non-zero, points manifesting significant features are more likely being added, and the incrementing process should continue. If the rate of change approaches zero, points being added are more likely random, and the compression/thresholding process should terminate.

3 The Algorithm

The flowchart for the wavelet and SVD registration algorithm is displayed in Figure 2.

As the flowchart shows, the reference and input images are read in, and decomposed by Fast Wavelet Transform [4]. These decomposition then enter the hot pixel extraction stage, which is controlled by the stability feedback loop. For each image compressed, hot pixel renderings are extracted, in the manner described in Section 2.2. We then compute the singular values and principal directions for each compressed image, in the manner described in Section 2.1. If the ratio of the singular values in an image is not sufficiently bounded or does not deviate sufficiently from unity, the stability loop increments the filter governing the soft thresholding parameter for the wavelet coefficients in the compression stage, producing a new configuration of hot pixels. If this ratio is sufficiently bounded and bounded away from unity, the stability loop then increments the filter governing the hard thresholding parameter for the pixel values in the hot point extraction stage, thereby varying the number of hot points in the compressed image. The loop then computes the rate of change of the orientation of the principal singular vector for the image with respect to the number of hot points, in the manner described in Section 2.2. The loop continues to add or remove hot points in this manner, until this rate of change becomes sufficient small. When this occurs, the stability loop outputs the final selection of compressed, hot pixel renderings of the reference and input images to the registration module.

In the manner described in Section 2.1, the registration module computes the singular value decompositions of the m x 2 matrices derived from the compressed, hot pixel renderings of the reference and input images, and from the decompositions computes a first approximation to the scaling, rotational, and translational parameters for the registering transformation. The module then transforms the compressed reference image in the manner dictated by the parameters, and compares the resulting image to the compressed input image, using a distance correlation measure. If the measure is not sufficiently small, the registration module initiates a loop to refine the registration computations. It increments the hard thresholding parameter to refine the compressed, hot pixel reference images, and iterates the registration step to compute a refinement for the translation parameters. When the distance correlation measure is rendered sufficiently small by the iteration, the module outputs the refined values for the scaling, rotational, and translational parameters.

4 Validation and Performance Results

In Table 2 we summarize some of the results of validation and performance tests of the algorithm on three types of images: a portrait of a girl, a Landsat TM image, and several AVHRR images. The images are available at the ftp site ftp://cesdis.gsfc.nasa.gov/pub/people/lemoigne/ at outgoing/TOOLBOX_TESTING.

In the case of the girl and the TM image, the input images were prepared as specific translations and rotations of the reference images. We present these values in the first column of Table 2 as a six digit number, two digits each for the angle, horizontal and vertical offsets.

The reference image for the AVHRR scenes is the file avhrr map, available at the ftp site. The scenes listed in the table are images taken by the satellite at different times of the same day, as indicated in the file name. In each case, the input image differs from the reference image by a pure translation.

For each input image, we present the angle, horizontal and vertical offsets from the reference image in columns two through four. In each case, the scale parameter automatically rounded up to one, as expected.

In columns five through seven we present our measurement of the performance of the algorithm as times required to input/output images to and from the algorithm, to perform the wavelet compression, and to perform the registration.

This computations were executed in a workstation alpha-dec-osf3.2.

Although the registration of the GIRL and TM images is good, significant deviations were noticed in the AVHRR images, particularly for sa1411. We attribute this deviation to extensive cloud coverage in the image. We discuss this situation in the conclusions, Section 5.

5 Conclusions and Generalizations

In summary we present a three stage algorithm with feedback to automatically register images differing by a global rigid transformation. The significant features of the algorithm are the use of nonlinear wavelet compression to extract control points in each image and the use of singular value decomposition tools to register these points. In contrast to linear wavelet compression, the use of nonlinear wavelet compression greatly filters the number of extraneous or redundant control points. The SVD tools are significant, because they allow us to register ensembles of control points without having to know specifically which points match up pairwise.

The algorithm has two major shortcomings. First, because wavelet decompositions are neither translation nor rotation invariant, selecting the compression and thresholding filters is intricate. Small variations in the registration parameters do not necessarily result in small changes to the compression and thresholding filters. The stability loop based upon the SVD computations allow us to minimize the production of artifacts at the compression stage and to lessen the impact on the computation of the rotation parameter; however, the algorithm remains vulnerable in the computation of the translation parameter. We are examining how preprocessing the images with a Hough transform or edge detector might mitigate these conditions for this version of the algorithm.

Second, the images which the algorithm ingests must inherently have within them a sufficient number of control points, sufficiently distributed spatially, in order to function successfully. In particular, the algorithm is noticeably unsuccessful on images present scenes obscured significantly by clouds. This was seen in the validation tests of Section 4, where the algorithm gives reasonable results for AVHRR images depicting basically cloudless scenes, but unreasonable results for the AVHRR image depicting a cloud filled scene.

In the next version of the algorithm we anticipate correcting this second shortcoming by pre-processing images with cloud filtering algorithms, as they become available. In particular, such algorithms based upon fractal cloud models and multi-spectral feature extraction methods appear particularly promising.

We anticipate correcting the first shortcoming in the next version of the algorithm by refining the compression stage through the introduction of a steerable pyramid [2]. A steerable pyramid is an overcomplete wavelet transform which produces a representation that is almost translation invariant (subbands are aliasing-free) and rotation invariant (subbands are steerable). We anticipate this tool will stabilize the filter selection processes and result in a more accurate computation of the translation parameters.

References

[1] DeVore, R. A., et. al.Using nonlinear wavelet compression to enhance image registration. Wavelet Applications IV, H. Szu, Chair/Editor, AeroSense97, Orlando FA, April, 1997. SPIE Proceedings, V. 3078, 1997, pp. 539- 551.

[2] Simoncelli, E. P. and Freeman, W. T. The steerable pyramid: a flexible architecture for multiscale derivative computation. IEEE Second Int'l Conf. on Image Processing, Washington D.C., 1995.

[3] Trefethen, L. N. and Bau, D. Numerical Linear Algebra. SIAM, Philadelphia, PA, 1997.

[4] Wickerhauser, M. V. Adapted Wavelet Analysis from Theory to Software. A. K. Peters, Ltd., Wellesley, MA, 1995.

1998, Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis