Robust spatial and spectral feature extraction for multispectral and hyperspectral imagery

J. E. Pinzóna, S. L. Ustinb, C. M. Castañedab, J. F. Piercec, and L. A. Costickb
aDepartment of Mathematics
University of California at Davis, Davis, CA 95616
 
bDepartment of Land, Air and Water Resource
University of California at Davis, Davis, CA 95616
 
cKT-Tech, Inc, 9801 Greenbelt Road, Suite 314
Lanham, MD 20706

Abstract

We present a hierarchical classification technique that discriminates broad categories of surface materials in terms of ground true features, such as water, vegetation, and soils from spectral information. Subsequently, we further discriminate these materials and extract finer ground features, like chemistries, peculiar to each. The interaction at various scales of the 3D spatial and the spectral domains is decomposed by using wavelet tools to address scale dependencies in the spatial domain, a robust spectral unmixing technique, called Hierarchical Foreground Background Analysis (HFBA) along the spectral axis. HFBA sequentially derives a series of weighting vectors discriminating features at different levels of detection: (1) constituent materials, (2) types within constituents, and (3) chemistries peculiar to each type. Our goal is two-fold. First, we present the combination of HFBA and wavelets as a supervised classification technique validating the categories imposed by the supervised classification, and manifesting clusters which can refine the classification at different scales. Second, we identify spectral redundancies between hyperspectral and multispectral information, studying mixtures at different spatial/spectral resolutions and assess whether targeted features may be extracted as efficiently from multispectral data as they could be from hyperspectral data. Results on AVIRIS and simulated MODIS data illustrate the robustness and effectivity of the technique.

1. Introduction

For hyperspectral images it is desirable to classify images within the conventional frame of reference of field and laboratory observations with methods that avoid intrinsic singular problems. In this respect, spectral mixture analysis (SMA) has become a well established procedure for analyzing imaging spectrometry data.1-4 SMA is a structured and integrated framework that simultaneously addresses the mixed-pixel problem, calibration, and variations in lighting geometry and displays the results in terms of proportions of endmembers that can be related easily to standard ecological observational units (e.g., cover). The general form of the SMA equation for each band is expressed as:
 
(1)
where Rb is the radiance at band b, Fem is the fraction coefficient of each endmember Rem weighting their radiance at band b, and Eb is an error term accounting for the unmodeled radiance in band b. Endmembers are chosen to explain the spectrally distinct materials that form the convex hull of the spectral volume.5 This approach works best when describing a few spectral types that, in various mixtures, can account for most of the variance in an image data set. It does not mean, however, that it is possible to identify any specific material. SMA works less well when the spectral features of interest are minor components of the total variance. In fact, SMA has the disadvantage, at least for this application, of approximating linearly the natural (non-linear) complexity of materials represented by the mixture of endmembers. This produces a non-unique mixing model when trying to identify and quantify materials that occur at the sub-pixel scale.6 In summary, the technique is relatively insensitive to subtle absorption features, and produces significant quantification errors due to endmember variability in a pixel from linear and nonlinear mixtures (e.g. from scattering, and lighting geometry).

Boardman7, used a geometric approach based on the convex hull of the spectra projected into the mixing space to find a solution that minimized spectral variation for some features while accentuating others. His technique is still an SMA approach that automatically derives the number of endmembers and estimates their pure spectral composition7, but it is suboptimal in the presence of multiple mixing. More recently, Harsanyi and Chang8 developed a mixture technique that rejects undesired interference by performing an orthogonal subspace projection (OSP). This technique simultaneously reduces data volume and emphasizes the presence of a signature of interest. Bolster et al.9 seeking the same goals, instead uses the first difference partial least squares regression (PLS) that is based on a singular value decomposition (SVD) of the whole spectrum data set. SVD reduces noise-related interference, common in a first difference analysis, and reduces the analysis into a smaller set of independent variables. Both, OSP and PLS, achieve good performance in detecting material abundances at low levels for a particular scenario by incorporating the variability of the material abundance into the more important independent variables (factors) but they are unable to extend the application to other scenarios. In order to develop a directed search methodology to locate the desired robustness (analytic) property, Smith et al.10 proposed a revised SMA technique, that they termed Foreground/Background Analysis (FBA). In this technique, spectral measurements are divided in two groups of foreground and background spectra that comprise a selected subset of spectra which emphasizes the presence of a signature of interest. In defining both groups they do not include intermediate mixtures between foreground and background. In that way, FBA vectors should be sensitive to minor sources of foreground spectral variation and insensitive to background spectral variation. The goal of FBA is to project spectral variation along the most relevant axis of variance that maximizes the spectral differences between the foreground and background, while minimizing spectral variation within each group. The FBA approach defines a weighting vector w = (w1, w2, ..., wNb), with components wb at each channel b = 1, ..., Nb, such that all foreground spectral vectors, Rf = (Rf,1 , Rf,2 , ..., Rf,Nb), are projected to 1 while background spectral vectors, Rb, to 0. This property is defined by the FBA system of equations:
 
      foreground
(2)
and
 
      background
where T provides a translation that is typically required to optimize the FBA system. As stated FBA is in essence another linear classifier of the spectra that can be applied to identify low and high material abundances. Pinzón et al.11,12 modified the FBA linear system to project a subset of spectra onto a relevant axis of continuous property variation, like chemical content.

In the next Section, we describe a hierarchical supervised spectral technique that extends the FBA system based upon the properties of SVD, e.g. stability solving ill-conditioned systems with a good packing of coherence (spectral) information, and the properties of wavelet decomposition, e.g. packing of coherence (spatial) information and noise reduction. In Section 3, we present two practical applications, and in Section 4 we discuss the power and extensions of these techniques.

2. Supervised Classification

For most purposes the problem of supervised classification can be formulated as follows: given an input space X and a desired property in an output space Y, there is an unknown (functional) relationship, F, between X and Y that is represented by a subset of m samples, from which one wants to guess the X - Y relationship. In general, F takes the form of a deterministic function plus noise. One is given the training set of m samples and the guess functional operator , the problem is to guess, using , what output space value, , is the most appropriate for a given input x. The precise meaning of "appropriate" can be difficult, and is measured through loss functions under a chosen metric. There are many possible choices for the metric and ways to minimize the loss function. Both determine the method used and its ability to generalize. For practical purposes, we shall restrict ourselves to the use of the lp norms with 0 < p < ¥ . These norms are defined by, Special cases include the mean absolute error (loss function)(p = 1) and the mean squared error (p = 2). The metric l2, a popular choice usually without a priori justification, is given by:
 
(3)
In our analysis, the l2 metric is chosen to include some desirable smoothness in the solution.

The method of SVD is used to find (xi) minimizing equation 3. The SVD process will surely find the solution with smallest |( xi)|.13 This solution is often called in regularization theory the principal solution of the zeroth-order regularization that corresponds to the more general minimization problem of the sum of the two positive functionals
 
      Î (X, Y) + l (||2)
(4)
in the limit of small l .14

Observe the minimization tradeoffs of Î (X, Y) versus ||2. Increasing l favors minimizing ||2 and pulls the solution away from Î (X, Y). The spatial coherence information of the SVD solution will be analyzed at different spatial scales using wavelet tools. By combining spectral and spatial features, we seek to decompose the interaction at various scales between the spatial and spectral domains. This interaction is coupled principally due to the spectral mixture of materials found among the different spatial scales (spatial resolution or grain) at which spectral measurements could be made (laboratory, field, airborne and satellite scales). As manifested in the parameter l , the mixture problem imposes an implicit precondition on the spatial analysis: first we need to identify the spectral features at a given scale that are related to a meaningful ground characteristic, and then we can apply a spatial analysis that manifests the intrinsic spatial coherence of the categories detected spectrally and validates its generalization. Thus, the spatial technique should accomplish two objectives: first, it should be a validation tool and second, a refinement of the spectral classification to improve its generalization.
 

2.1. Hierarchical Spectral Learning

Along the spectral axis we employ a robust spectral unmixture technique based on SVD tools, the so called Hierarchical Foreground Background Analysis (HFBA). Our goal is to extract spectral characteristics with direct ground interpretation. For this purpose, Pinzón et al. 15,16 modified the FBA system to project a subset of spectra into the most relevant axis of variation of a desired property. In this case, the system of equations is given by
 
(5)
That is, the reflectance matrix R times the FBA weighting vector w is equal to the desired ground characteristic C. HFBA derives a series of weighting vectors discriminating features at different levels of detection: (1) constituent materials, (2) types within constituents, and (3) chemistries peculiar to each type. It is based on the following observations (see Figure 1): The power of the HFBA method becomes apparent as we begin to catalogue more precisely the performance of the SVD in energy-packing and avoidance of overfitting problems due to its stability properties. First, r, the rank or dimension of the matrix R, could be estimated by examining the number of non-zero singular values.17 Second, the decomposition R = USV* provides an approximation of the matrix R by a sum of rank-one matrices.17 That is,
 
(6)
Here, r is the rank of the matrix R, and s j its singular values, uj, and vj the left and right singular column vectors respectively. This is easily shown by noticing that S can be written as a sum of r matrices Sj , that is S = S rj=1 Sj,

where each Sj has just one nonzero entry s j in its diagonal. Then Equation 6 follows. One can find a very large number of different representations of R as a sum of rank-one matrices. However, Equation 6 represents the best approximation of R. That means that the hyper-ellipsoid with principal axis of length s j's, provides a very important property: the q-th partial sum captures as much of the detail of R as possible.17 That is, the best least squared approximation of a matrix R by matrices of lower rank q (q <r), is given by Rq = S qj=1 s j uj vj. Third, when solving the FBA equations at each level with spectral matrices R close to rank-deficient, it turns out that most of the standard algorithms used to solve such systems have ill-conditioned stability properties. In such cases, SVD is a good stable alternative.17 Computationally, SVD is more expensive than the standard methods, but more accurate and stable. This is the principal advantage of using SVD in the solution of the FBA equations: a stable method to process hyperspectral (rank-deficient) matrices R.
 

2.2. Multiresolution and Spatial Learning

The real value of generalization lies not in its ability to explain what one has already seen (in the training set), but in predicting or detecting anomalies or events that have not been yet studied. One of the strong points of HFBA is that we can group together samples with similar anatomical properties which are manifested spectrally and spatially. However, intrinsic geometric and spatial coherence properties on the predicted test set (test images) have been neither considered nor exploited explicitly yet in the HFBA algorithm.

Finding the spatial coherence information is a particular case of image compression and noise reduction. Devore18 suggested a unified mathematical framework which is strongly related to our discussion of supervised learning. It uses wavelets to study the rate of decay in the error between the original image and the compressed image in terms of Lp norms. This rate of decay provides a criteria to measure the smoothness of the functions being approximated, and provides conditions (and bounds) for generalization. For example, the rate of approximation of a function is related to the classification of images by their membership in smoothness spaces.18,19 More precisely, they have an approximation space X, and a smoothness space Y, and seek to find a function g Î Y that approximates a function f Î X by minimizing
 
      || f - g || x + l || g || Y 
(7)
for a positive constant (Lagrange multiplier) l . The resemblance to the supervised learning equation, (4), is clear. In fact, the first term measures the approximation error between f and g, while the second term measures the "smoothness" (structure) of g. In other words, the parameter l determines the relative importance of error and smoothness as stated for the supervised learning equation.

Devore minimizes equation 7 by "taking a wavelet projection that corresponds to low-pass filter, with a frequency limit that depends on l .18" This acts to compress an image, to smooth a noisy image, or to extract quintessential features for a given task, by providing a suitable feature space for noise reduction and spatial coherence extraction. Devore's work contributes a new perspective to the problem of standard pattern recognition by examples. A multiresolution analysis of a multi-spectral image stimulates advances in two directions: the spectral description of spatial processes and discrimination of materials detected at different scales and potential scaling laws that can be used to relate spectral changes of natural phenomena on disparate scales.

3. Results

We present results from two applications. First, we present the combination of HFBA and wavelets as a supervised classification technique validating the categories imposed by the supervised classification, and manifesting clusters which can refine the classification at different scales. Second, we identify spectral redundancies between hyperspectral and multispectral information, studying mixing at different spatial/spectral resolutions and assess whether targeted features may be extracted as efficiently from multispectral data as they could be from hyperspectral data.
 

3.1. Differentiating Bare Ground from Rock Outcrop

The purpose of this study was to provide complementary resource inventory information to screen and index watersheds of the central Sierra Nevada Mountains of California to asses their potential for soil erosion and possible contribution to sedimentation. For this purpose HFBA vectors were trained with pixel spectra from geo-registered AVIRIS images discriminating soils, non-photosynthetic vegetation and rocks. Figure 2 presents average spectra of the classes represented in the training set.

Figure 3 shows the resulting HFBA classification vector and how it weighs the reflectance in each of the bands. The peaks in the HFBA vector around 700 nm and 1400 nm identify reflectance characteristics useful to differentiate vegetation and litter from soil and rock signatures. We can also see consistent high negative weights around 800-1200 nm and 2000-2400 nm of the near infrared region. These will be useful in the identification of bare soil and rock signatures.

Figure 4 compares five methods of differentiating bare ground from rock outcrop given inset 1: inset 2 is a composite of selected bands from AVIRIS, inset 3 is the Normalized Difference Soil Index (NDSI), inset 4 is the ratio between Normalized Difference Vegetation Index and soil index (NDVI/NDSI), inset 5 is the HFBA result, and inset 6 is the standard linear Spectral Mixture Analysis (SMA) from Landsat TM. All the images are from the Capps Crossing training site of the central Sierra Nevada Mountains of California. The circles on insets 2-6 are areas that have been mapped using GPS as granitic shield. Circles are GPS training sites.

Only HFBA accurately defines the rock area in the image and distinguishes it from the forest clear-cut just north and east of the GPS site, mainly because the HFBA classification were able to extract useful canopy reflectance characteristics in the near infrared region that improves the detection of bare soil, litter and rock signatures. The HFBA classification after coi et noise reduction corresponds well with field data and reconnaissance of some watersheds of the Cosumnes and American River Basins.

For more information about the Sierra Nevada Mountains see Costick, 1996.20 Costick has studied forty watersheds of the Cosumnes and American River Basins that includes both Camp and Cat Creeks, which have been the focus of intensive ecological reviews published by the Forest Service.20 In fact, Costick has compiled a complete resource inventory information of meaningful GIS and geo-registered remote sensing images that makes the Sierra Nevada a perfect region for training and validation of supervised learning techniques.
 

3.2. Spectral Redundancies

Our aim is two-fold. First, we want to improve the description of pixels between the boundaries of two different classes in a HFBA classification after coi et noise reduction. We assume these pixels have a strong mixture between endmembers of the two classes. For this purpose, we revisit the spectral unmixing problem from the point of view of approximation theory and present a non-linear (constrained) least squared unmixing technique that should have in all cases meaningful fractions: disregard if spectra is not contained within the convex hull defined by the library endmembers. The endmembers that account for a pixel mixture are those selected from the training set of an HFBA application whose HFBA values were closest to the pixel HFBA value. Second, we consider the constrained unmixing problem under the light of tradeoffs of spectral resolution with more number of spectral band intervals versus spatial resolution and measurement accuracy. Coherent information is found around neighbor spatial or spectral locations in any hyperspectral image, resulting in redundant information in both spatial and spectral domains. Given an application, the spectral redundancy should be exploited accordingly.
 

3.2.1. Non-linear Spectral Mixture Analysis (Non-linear SMA)

The linear mixing model explained in Section 1 is generally incomplete in several ways. We cite two. Rarely, one can account for all the endmembers; thus, a strict unity sum constraint overestimates the endmember fractions.2 The system of equations are in essence ill-conditioned; thus, the model suffers from conceptual and numerical problems that include negative fractions returned by the standard linear procedure.21 This leads naturally to the constrained least squared approximation: minimize || Rb - S mi=1 fi Eb,i ||2 for m endmembers spectra, Ei at band b subject to the following constraints:
 
      fi > 1 and fk < 1 for i=1 ..., m
(8)
We used Powell's nonlinear constrained optimization22 implemented in Matlab (Optimization Toolbox)23 to find the fractions fi subject to the constraints given in Equation 8.

Both techniques were tested by unmixing a mixed spectra and a pure spectra in terms of 10 endmembers chosen from those spectra whose HFBA values were nearest to the HFBA value of the pixel being unmixed. Both methods worked well unmixing the pure spectra: a fraction close to one hundred was found for the endmember that represented it. While consistent and meaningful fractions were obtained for the unmixing of the mixed spectra using the non-linear approach, negative values were obtained using standard linear SMA (see Table 1). In fact, the non-linear approach had the advantage of automatically guaranteeing positive fractions and avoiding overestimations, properties that the standard linear approach lacks.

3.2.2. MODIS simulations

We unmixed simulated MODIS* spectra from AVIRIS data and applied these fractions to unmix the original AVIRIS spectra. Endmembers were chosen from those spectra whose HFBA values were nearest to the HFBA value of the pixel being unmixed. As a consequence the endmembers were proper for the unmixture using MODIS spectra (Figure 5). We also evaluated the performance of MODIS HFBA vectors estimating bio-chemical properties for which MODIS bands are appropriate.

The results suggest that the approximation is more sensitive to the selection of the endmembers, than to the lack of spectral resolution in vegetation studies for which MODIS bands are appropriately located. Motivated by this result, we generated an HFBA MODIS vector to estimate water content using laboratory data and applied it to the Santa Monica image. Figure 6 presents a comparison between the general properties and performance of HFBA water vectors generated using AVIRIS and MODIS data. The MODIS HFBA vector remarkably resembles the AVIRIS HFBA vector around the center of the 20 MODIS bands in the Visible Near InfraRed (VNIR) and Short Wave InfraRed (SWIR) regions. Their performance was comparable.

Figure 7 shows water content prediction for the Santa Monica Image using the MODIS HFBA vector. In this case, MODIS was appropriate for the application, since the locations of its bands are suitable for estimations of water content. The result was validated through a comparison to many other reliable remote sensing methods predicting water content, see Ustin et al., 1998 for a complete description of this study.24

4. Conclusions

A new robust approach for the detection and classification of materials was developed and tested. Spectral and spatial interactions directly associated to ground units are uncoupled using wavelet tools and a Singular Value Decomposition (SVD) based technique, the so called hierarchical foreground background analysis (HFBA).

The power of the HFBA technique is based on the attractive properties of the SVD transform in information packing and avoidance of overfitting problems by minimizing extraneous noise in the analysis. The technique was trained over laboratory data and applied to AVIRIS images. It is clear from the above experiments that the proposed approach is promising.

As a conclusion, we consider that a combination of HFBA and wavelets has significant potential and certainly deserves further investigation. There are many aspects of the discrimination among materials that still need investigation. In particular, we presented a non-linear mixture technique that improves HFBA-wavelet classifications, especially between class boundaries where strong mixtures were presented. The non-linear approach had the advantage of automatically guaranteeing positive fractions and avoiding overestimations, properties that the standard linear approach lacks. The non-linear SMA exhibited almost the same performance in all cases: mixed and pure spectra. The results suggest that the approximation is more sensitive to the selection of the endmembers rather than the lack of spectral resolution in vegetation studies for which MODIS bands are appropriately located. We also explored spectral redundancies comparing the performance of AVIRIS and simulated MODIS information on an applications suitable for both sensors. Comparable performance was obtained. This indicates that there is a lot more to be understood about spatial/spectral tradeoffs to help assess the future of satellite-based land sensing in the next decade.

References


*MODIS: Earth Observing System (EOS), AM and PM carrying Moderate Resolution Imaging Spectrometer (MODIS). Two satellites are planned to be launched in late 1998 (AM) and early 2000 (PM). We will consider only the MODIS specifications in the VNIR and SWIR regions for comparison purposes with AVIRIS. It includes 2 bands centered at 659 and 865 nm with 250 meters resolution for vegetation and cloud boundary studies, 5 bands at 500 meters spatial resolution (470, 555, 1240, 1640, 2130 nm) for land and cloud cover applications and 9 bands at 1000 meters resolution mainly for chlorophyll, ocean and atmospheric applications located at 415, 443, 490, 531, 555, 667, 681, 750, and 865 nm. Observe that the SWIR region is not very well represented by this instrument resulting in a strong limitation for soil applications.
 
1998, Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis