In this paper, we present a three step algorithm with feedback which
automatically registers images differing by a translation, a rotation,
and/or a homogeneous scaling (homotheity). First, we identify potential
control points as hot points by highly compressing the reference and input
images using non-linear wavelet techniques. Second, we use singular decomposition
(SVD) methods to compute singular values and principal directions for these
ensembles of hot points. Third, we use these data from each ensemble to
determine a registering transformation. Feedback between the first and
second stage allow the SVD computations to optimize the probability that
the ensembles of hot points manifest control points. Feedback among the
three stages designed to optimize a correlation function refines the accuracy
of the registering transformation.
The algorithm consists of three stages with feedback. The first stage of the algorithm compresses the reference and input images, extracting hot pixels which manifest features significant to each image. The second stage carries out the SVD computations on each of the compressed images to extract singular values and principal directions for each ensemble. The third stage uses the centroid, singular values, and principal directions for each ensemble to compute the registering transformation.
A stability feedback loop between the second and first stages allows the SVD computations to refine the compression and to maximize the probability that the ensemble of hot pixels in both images manifest viable control points, a significant number of which will match. A feedback loop among the third, second, and first stages refines the accuracy of the registration computed by the third stage.
We designed the algorithm to be computationally simple, fast accurate, reliable and automatic, to the extent that these requirements are compatible. For example, given a choice between speed and accuracy, we designed the algorithm for speed; given a choice between speed and computational simplicity, we designed the algorithm for simplicity. In Section 2, we present the logical foundation for the algorithm. We introduce the elements of the SVD and how they can be used with the ensembles of control points from the reference and input images to assess the registering transformation. We then indicate how the addition of random points which do not match to either ensemble affects the conclusions for the computed registration. It is this analysis which establishes the viability of using this technique computationally for registration purposes.
We then present the grounds by which we use nonlinear compression to
extract "hot pixels" which have a high probability of manifesting features
significant in each image, and thereby have a significant probability of
representing control points which match under the registration. It is this
analysis which serves as the basis for the stability feedback loop, whereby
the SVD computations optimize the compression. In Section 3, we lay out
the elements of the algorithm. In Section 4, we present the results of
the validation and performance tests. In Section 5, we present conclusions
about the shortcomings of the current version of the algorithm and set
forth ways by which we could improve it in future versions.
|
(1)
|
Consider two ensembles of points P and Q in the reference and input images, respectively. We say the ensembles match under the transformation if points of the first are taken onto the second in a one-to-one manner by the transformation. When this occurs, we can show: (1) that the centroids of the ensembles register under the transformation, (2) that we can isolate the rotation and scaling parameters for the transformation by examining the positions of points in each ensemble relative to their respective centroid, (3) that we can deduce the rotation and scaling parameters from the SVD computations for two matrices built from the components of the position vectors for the points in each ensemble relative to their respective centroids, and (4) once these parameters are computed, we can deduce the translation parameters from the position of the centroid in the input image and the rotated and scaled position of the centroid in the reference image.
Specifically, let P0 and Q0 denote
the centroids of the respective ensembles. If the two ensembles match under
the transformation, then Equation (1), implies that
|
(2)
|
|
(3)
|
We now demonstrate that if we arbitrarily order each ensemble, form (with respect to a fixed spatial basis) the m x 2 matrix of coordinates for each point in each (ordered) ensemble, relative to its respective centroid, and compute the singular value decomposition of each matrix: (1) the ratio of the corresponding right singular values for the two matrices, ordered by magnitude, determines the scaling factor s, and (2) the difference in the directions of the corresponding right principal singular vectors determine the rotation angle q for the transformation.
Specifically, let P and Q be two ensembles
of points in the reference and input images, respectively, which match
under the transformation Equation (1). Order each ensemble arbitrarily.
If m denotes the cardinality of each ensemble, Equation (3) implies,
for 1 < i < m,
|
(4)
|
|
(5)
|
| where | |
|
(6)
|
We now demonstrate we can assess, in principle, the rotation and scaling
parameters for the transformation from the singular values and principal
directions for the matrices associated with the ensembles of points in
the reference and input images. For any m x n real matrix
X , a right singular value for X is a real, positive
number s satisfying
|
(7)
|
A computation similar to that leading to Equation (6) shows that the singular values and principal singular directions for a matrix are independent of the ordering of the rows of the matrix. Indeed, for the case where X is the m x 2 matrix associated with the positions of the ensemble of points in the reference image, relative to the centroid, the principal singular directions specify the lines of the semi-major and semi-minor axes of the smallest ellipse centered at the centroid enveloping the ensemble, and the ratio of the singular values specify the eccentricity of the ellipse [3, p. 26]. In other words, the singular values and principal directions characterize geometric qualities of the ensemble which are independent of the ordering of the points.
We can now see how the singular values and principal directions for the matrices associated with the ensembles of points in the reference and input images characterize the rotation and scaling parameters for the transformation.
Theorem 2.1. Let P and Q denote ensembles of points
in a reference and input image, respectively, matched under a transformation
of the form Equation (1) which registers the images. Order the two ensembles
arbitrarily. Let X and Y denote the m x 2 matrices
whose rows are the components (with respect to a fixed spatial basis) of
the position vectors to the ordered points in the reference and input ensembles,
relative to their respective centroids.
| If | |
|
(8)
|
| then | |
|
(9)
|
| for | |
|
(10)
|
Proof. X and Y satisfy Equation(5), for some orthogonal
matrix P representing the action of a suitably chosen permutation on m
elements. Let s, u satisfy Equation
(8). Then
|
(11)
|
Equations (9) and (10) assert that the difference in the orientations of the principal directions for the ellipses enveloping the ensembles of points in the reference and input images determines the angle parameter for the registering transformation, and the ratio of the corresponding singular values, ordered by magnitude determines the scaling parameter.
When the ensembles contain points which do not match under registration, there is impact upon the conclusions of Theorem 2.1. However, if the points in the ensembles which do not match under registration are randomly distributed, the SVD computations of the orientation of the principal singular directions and of the singular values are, in the mean, unaffected. Consequently, the impact upon the computations of the rotation and scaling parameters will be, in the mean, minor. Figure 1 demonstrates this assertion empirically. In this figure, we show average variation of the principal direction after 100 computations of ensembles augmented by 20% of randomly generated set of points. In all cases, the variation is less than 1 degree.
Table 1 summarizes one of many computations of scale factor, rotation, and translation parameters for an ensemble of points transformed by the values in parenthesis then augmented 20% by a randomly generated set of points. For the rotation and scale parameters, the results concur with the assertion in Figure 1.
However, the computation of the translation parameters, which is based
upon centroid computations is more sensitive to the presence of points
in one ensemble not matching under registration to points in the other,
as Table 1 shows. The problem becomes acute
when the ensembles significantly differ, because of cropping of the input
image, for example. We address this concern in the conclusion, Section
5.
A hard thresholding of these spatial images, using a filter built from the histogram of the pixel gray level values of these images, allows us to "remove penumbra", and produce compressed, hot pixel versions of the reference and input images. Incrementing this filter allows us to add or remove hot pixels from the compressed images.
By itself, this process will not automatically extract ensembles of points which manifest features inherent to the images, and thereby present themselves as candidate control points. However, results of the SVD computations can control the incrementation of the filters in the compression and spatial thresholding steps, and thereby optimize the probability that a significant number of hot pixels in each image do in fact manifest inherent features of the image. This control is the foundation for a stability feedback loop. We base the assertion on the following observation: if you add to an ensemble of points a collection of randomly distributed points, the orientation of principal singular direction associated with the original ensemble will be, in the mean, unaffected. This observation is the basis of how we designed the program to automatically control the compression and thresholding filters.
We assume that hot pixels which manifest features inherent in an image
are not randomly distributed, and that hot pixels which manifest artifacts
in an image, such as incidental lighting, are. For a given compression,
we design the program to increment the thresholding filter, to produce
or remove hot pixels from each compressed image. The program computes the
rate of change in the orientation of the principal singular direction,
with respect to the addition of hot pixels. If the rate of change is sufficiently
non-zero, points manifesting significant features are more likely being
added, and the incrementing process should continue. If the rate of change
approaches zero, points being added are more likely random, and the compression/thresholding
process should terminate.
As the flowchart shows, the reference and input images are read in, and decomposed by Fast Wavelet Transform [4]. These decomposition then enter the hot pixel extraction stage, which is controlled by the stability feedback loop. For each image compressed, hot pixel renderings are extracted, in the manner described in Section 2.2. We then compute the singular values and principal directions for each compressed image, in the manner described in Section 2.1. If the ratio of the singular values in an image is not sufficiently bounded or does not deviate sufficiently from unity, the stability loop increments the filter governing the soft thresholding parameter for the wavelet coefficients in the compression stage, producing a new configuration of hot pixels. If this ratio is sufficiently bounded and bounded away from unity, the stability loop then increments the filter governing the hard thresholding parameter for the pixel values in the hot point extraction stage, thereby varying the number of hot points in the compressed image. The loop then computes the rate of change of the orientation of the principal singular vector for the image with respect to the number of hot points, in the manner described in Section 2.2. The loop continues to add or remove hot points in this manner, until this rate of change becomes sufficient small. When this occurs, the stability loop outputs the final selection of compressed, hot pixel renderings of the reference and input images to the registration module.
In the manner described in Section 2.1, the registration module computes
the singular value decompositions of the m x 2 matrices derived
from the compressed, hot pixel renderings of the reference and input images,
and from the decompositions computes a first approximation to the scaling,
rotational, and translational parameters for the registering transformation.
The module then transforms the compressed reference image in the manner
dictated by the parameters, and compares the resulting image to the compressed
input image, using a distance correlation measure. If the measure is not
sufficiently small, the registration module initiates a loop to refine
the registration computations. It increments the hard thresholding parameter
to refine the compressed, hot pixel reference images, and iterates the
registration step to compute a refinement for the translation parameters.
When the distance correlation measure is rendered sufficiently small by
the iteration, the module outputs the refined values for the scaling, rotational,
and translational parameters.
In the case of the girl and the TM image, the input images were prepared as specific translations and rotations of the reference images. We present these values in the first column of Table 2 as a six digit number, two digits each for the angle, horizontal and vertical offsets.
The reference image for the AVHRR scenes is the file avhrr map, available at the ftp site. The scenes listed in the table are images taken by the satellite at different times of the same day, as indicated in the file name. In each case, the input image differs from the reference image by a pure translation.
For each input image, we present the angle, horizontal and vertical offsets from the reference image in columns two through four. In each case, the scale parameter automatically rounded up to one, as expected.
In columns five through seven we present our measurement of the performance of the algorithm as times required to input/output images to and from the algorithm, to perform the wavelet compression, and to perform the registration.
This computations were executed in a workstation alpha-dec-osf3.2.
Although the registration of the GIRL and TM images is good, significant
deviations were noticed in the AVHRR images, particularly for sa1411. We
attribute this deviation to extensive cloud coverage in the image. We discuss
this situation in the conclusions, Section 5.
The algorithm has two major shortcomings. First, because wavelet decompositions are neither translation nor rotation invariant, selecting the compression and thresholding filters is intricate. Small variations in the registration parameters do not necessarily result in small changes to the compression and thresholding filters. The stability loop based upon the SVD computations allow us to minimize the production of artifacts at the compression stage and to lessen the impact on the computation of the rotation parameter; however, the algorithm remains vulnerable in the computation of the translation parameter. We are examining how preprocessing the images with a Hough transform or edge detector might mitigate these conditions for this version of the algorithm.
Second, the images which the algorithm ingests must inherently have within them a sufficient number of control points, sufficiently distributed spatially, in order to function successfully. In particular, the algorithm is noticeably unsuccessful on images present scenes obscured significantly by clouds. This was seen in the validation tests of Section 4, where the algorithm gives reasonable results for AVHRR images depicting basically cloudless scenes, but unreasonable results for the AVHRR image depicting a cloud filled scene.
In the next version of the algorithm we anticipate correcting this second shortcoming by pre-processing images with cloud filtering algorithms, as they become available. In particular, such algorithms based upon fractal cloud models and multi-spectral feature extraction methods appear particularly promising.
We anticipate correcting the first shortcoming in the next version of
the algorithm by refining the compression stage through the introduction
of a steerable pyramid [2]. A steerable pyramid is an overcomplete wavelet
transform which produces a representation that is almost translation invariant
(subbands are aliasing-free) and rotation invariant (subbands are steerable).
We anticipate this tool will stabilize the filter selection processes and
result in a more accurate computation of the translation parameters.
[2] Simoncelli, E. P. and Freeman, W. T. The steerable pyramid: a flexible architecture for multiscale derivative computation. IEEE Second Int'l Conf. on Image Processing, Washington D.C., 1995.
[3] Trefethen, L. N. and Bau, D. Numerical Linear Algebra. SIAM, Philadelphia, PA, 1997.
[4] Wickerhauser, M. V. Adapted Wavelet Analysis from Theory to Software. A. K. Peters, Ltd., Wellesley, MA, 1995.