Multivariate Statistical Classification of Soil Spectra

Alicia Palacios Orueta and Susan L. Ustin
Department of Land, Air, and Water Resources
University of California, Davis, Ca 95616
 
Submitted to:
Remote Sensing of Environment
November 1995
 
 
 
Address for Correspondence:
Alicia Palacios Orueta
Department of Land, Air, and Water Resources
University of California
Davis, CA 95616
phone: (916) 752-5092
fax: (916) 752-5262
email: alicia@vache.ucdavis.edu

Abstract

The purpose of this work is to evaluate whether AVIRIS (Advanced Visible/Infrared Imaging Spectrometer) bands can be used to discriminate between soils having similar properties, as well as to compare AVIRIS spectra with those from laboratory measurements. Multivariate analysis techniques show that two soils belonging to the same series and a third soil belonging to a different but related series can be discriminated at a high level of accuracy using reflectance data from AVIRIS or from laboratory measurements. It is also shown that wavelengths important in discriminating soils were highly correlated between AVIRIS and laboratory data. The distribution of variance and weighting functions also show consistent patterns between these data sets.

Introduction

The monitoring of soils is a major challenge to soil scientists. Soils exhibit continuous variation in space and time. Both natural and anthropogenic processes, such as agriculture or grazing practices, can change soil properties in ways that produce cumulative environmental impacts. Studies of spatial and temporal changes are needed for identifying environmental problems and for developing mitigation strategies. Monitoring small changes are difficult because of the high spatial variability. It is likely that most soil changes will be small on decadal time scales relative to the large spatial heterogeneity existing in the pedisol even on a local scale. Nonetheless, the potential significance of biogeochemical feedbacks onto the climate system make such measurements important. We have examined the application of high spectral resolution remote sensors as a means to provide spatial information on the properties and conditions of soil surfaces and to evaluate its potential to detect small differences in soil properties. The soils chosen for this study exhibit a range of variation comparable to potential decadal scale temporal changes that result from altered surface processes.

Spectrometry has been used for 25 years to differentiate highly divergent soil types.

Researchers have shown that soils having dissimilar properties can be discriminated using reflectance measurements. Condit (1972) classified 160 soil spectra into three types based on their spectral shapes, the first type corresponding to chenozern soil, the second to pedalfer and the third type to a red quartz and calcite sand. In an analysis of 485 soils, representing the major orders of the globe, Stoner and Baumgardner (1981) distinguished five distinct reflectance forms, based on curve shapes and absorption bands. These soils had different physical and chemical characteristics however spectral differences were mainly due to organic matter and iron oxide content. Other recent studies have supported these findings (Csillag et al., 1993; Henderson et al.,1992; Zhang et al., 1992).

Satellite multispectral data has been used as well to discriminate between soil types. Agbu et al. (1990), used Systeme Probatorire d'Observation de la Terre (SPOT) data and found that some soil properties like organic matter content, particle size distribution, and color could be used to predict satellite reflectance. Coleman et al. (1993) used the Thematic Mapper (TM) to differentiate surface soils with high accuracy and found significant correlations between radiance data and organic matter, iron content, and particle size distribution.

The development of imaging spectrometry makes it possible to obtain observations at a much higher spectral resolution. De Jong (1992) investigated the use of imaging spectroscopy for mapping erosion hazard in Mediterranean areas. High resolution (3 nm) spectral measurements of different soils were made and correspondence analysis was used to compare soil spectra and physical data. Two types of soils with different vulnerability towards erosion were identified. Lime, clay, iron and organic matter content were the most important variables in discriminating these soils, and absorption features from iron and carbonate could discriminate between them. The amount of information that imaging sensors can provide is much greater than current satellites and improved information retrieval methods are needed to fully utilize airborne/satellite sensors like NASA's Advanced Visible/Infrared Imaging Spectrometer (AVIRIS) available now, and LEWIS HyperSpectral Imager (HSI) (De Long et al., 1995), or the Hyperspectral Data Image Collection Experiment (HYDICE) (Basedow, 1995) which are expected to become operational in 1996.

A remaining challenge for imaging spectrometry is to discriminate among soils having similar chemical and physical properties such that their spectral properties can be used to monitor changes in soil quality. The purpose of this study is to evaluate the ability of such high resolution visible and reflected infrared sensors to discriminate among three soil phases that are similar in their physical-chemical properties. The three soils we chose to examine co-occur in an agricultural area and belong to two related soil series, Yolo, and Brentwood. Repeated tillage and fertilization of the land during the last century of intensive agriculture is expected to have further limited differences among these soils. They are of interest for this application because the soils mainly differ in organic matter and particle size distribution at a magnitude of variation that might be experienced on a short time scale if altered biogeochemical cycles resulted from factors such as climate forcing or human activities.

Study site

The study site comprises an agricultural area of approximately 2900 ha along the southern margin of Putah Creek, California. It is located in the fertile Sacramento Valley just east of the inner central Coast Range and east of the town of Winters, CA. The northeast coordinates are: 4266072 (N) and 597297(E). The primary agricultural commodities of the area include walnut, almond, plum, and peach orchards. Winter crops include small grains and alfalfa, and summer crops are tomatoes, safflower, and corn.

Soil Characteristics

One hundred thirty-eight surface soil samples representing three soil phases were collected from 46 sites. The soil phases are: Brentwood clay loam (class 1), Yolo loam (class 3), and Yolo silty clay loam (class 4). Topographic slopes within the study area range from 0 to 2%. These are deep well-drained soils situated on a young alluvial fan of the Putah Creek watershed. Soils are alluvium derived from sedimentary and metamorphically altered Cretaceous marine sediments of the central Coast Range. These soils are generally high in silt content, although different for the three types. Each soil has a thick dark colored upper horizon. All soils are from the inceptisol order, the Brentwood series is classified as a fine montmorillonitic thermic Typic Xerochrept, and Yolo as a fine silty mixed thermic Typic Xerochrept. Both Yolo loam and Yolo silty clay loam have a grayish-brown surface layer. Soil permeability is moderate. Brentwood soils have a grayish-brown clay loam surface layer and a grayish-brown clay loam subsoil with moderately slow permeability. Brentwood soils have a somewhat better developed profile due possibly to its location at a lower elevation depositional sites. They have a well developed B horizon while Yolo does not. All of these soils have been significantly altered by farming practices (e.g., leveling and irrigation) during the past century. The A horizon has been well mixed however agriculture has not caused much additional change since these soils have few morphological features other than the dark thick A horizon. The main differences between the soil phases are in particle size distribution and organic matter content. Eighty percent of the study area is occupied by the Yolo series, 63% of the soils of this area belong to the Yolo silty clay loam soil unit, while 17% are Yolo loam.

Some farming practices such as addition of soluble phosphorus or total sulfur can affect soil properties but do not directly contribute to the definition of taxonomic units or to their reflectance characteristics. Irrigation patterns create some of the larger spatial/temporal differences in soil reflectance, although primarily affecting albedo. Management can affect reflectance characteristics by minimizing differences between soil classes while increasing within-class variability. Our goal is to discriminate between these closely related soil phases based on metastable soil properties, i.e., those less affected by tillage, but which can change over extended time periods and are related to processes like soil degradation or erosion.

Data and Methodology

Soil Sampling

Soil samples were collected from the top three cm of soil over a spatial grid spaced approximately 600 m apart. Altogether, 138 surface soil samples were collected from 46 sites. Three replicates were collected at each of the sites with the purpose of better representing the spatial variability of each of the sites. The sampling was done at the end of June and beginning of September 1993. At this time the field conditions were similar to those at the time of AVIRIS flight (July 1992). A Trimble Pathfinder Basic Plus was used to identify the exact location (+/- 3m after differential correction) where soils were collected. The samples were then analyzed to define the class to which each soil sample belongs. All soils were extracted from bare areas, located either in orchards or in dormant fields.
 

Sample Preparation

The sample preparation followed the procedure from Henderson et al. (1992). Water content was standardized by oven drying at 40 C for 24 hours to avoid volatilization of organic matter and altering soil composition. The samples were immediately placed in a desiccator for at least 24 hours until the spectral measurements were made. To minimize anisotropic scattering of light by soil particles of variable size, soils were ground with a mortar and passed through a 0.25 mm sieve. It has been shown that different methods of sample preparation can cause some differences in the absolute intensity but that they do not alter the wavelength position of the spectral features (Hunt and Salisbury, 1970).
 

Spectroscopic Technique

The measurements were made with a Varian Cary 5E spectrophotometer. This is a visible and near-infrared instrument with a tungsten lamp. A 150 mm diameter Labsphere integrating sphere covered with Spectralon was used to measure total hemispherical reflectance. Spectralon is a highly lambertian material characterized by diffuse reflectance of 99% between 200 nm and 2500 nm.

To facilitate comparison, the spectrometer was optimized to make measurements over AVIRIS wavelengths in the NIR. In the visible wavelengths, measurements were obtained at 2 nm intervals and visible bands were post processed to simulate AVIRIS bands (nominally 9.6 nm/band). This was done by averaging the spectrometer bands to AVIRIS wavelengths using a linear weighting function. The value of this function ranged between 0 and 1, in the wavelength interval over each AVIRIS band. The linear function corresponds to a triangular bandpass with a full width half max specified by the spectral characteristics file supplied with the AVIRIS image. This way higher weights were assigned to the CARY bands closer to the center of AVIRIS bands.

The instrument was calibrated using a 50% reflectance sample and the rare earths Erbium Oxide and Prisposium Oxide. Each day a 100% spectralon reflectance standard was used to normalize for possible changes in the spectrometer. The dry soils were set in sample holders at a thickness of 1.5 cm and were covered with Herasil 1, a fused silica glass cover, to prevent the electrostatic dispersion of the soil into the integrating sphere. The glass has a refractive index of 1.4585. The CARY 5E has four detectors measuring over the 400-2500 nm interval. In the spectral region between 850 nm and 900 nm, the photomultiplier tube and the infrared detector are insensitive, therefore, this region was eliminated from further analysis.
 

Available Map Data

All the input map data sets were registered to ground control points which were selected from the USGS 7.5 minutes topographic quadrangle (Winters, CA). Soil information was obtained from the Soil Conservation Service, Soil Survey for Solano County, completed in 1977. The map scale was 1: 20,000.
 

Image Data

AVIRIS imagery used in this study was acquired on July 31, 1992. The AVIRIS sensor acquires 224 spectral bands with spectral resolution of 10 nm, between 400 and 2500 nm. Its spatial resolution is 20 m. The image scene is about 10 km by 8 km. The AVIRIS image used was not radiometrically calibrated to radiance values. Apparent surface reflectance retrieval was determined using a radiative-transfer based atmospheric model (MODTRAN II) that accounts for spatial variation of the atmospheric conditions (Roberts et al., 1995) for this scene, spatial variation in atmospheric water vapor was low and at the level of instrument noise.
 

Geographic Information System Procedures

All spatial co-registration was done in the Geographical Resource Analysis Support System (GRASS 4.1) developed by the U.S. Army Corps of Engineers Construction Engineering Research Laboratory (CERL). The topographic quadrangle was imported in GRASS and rectified to Universal Transverse Mercator (UTM) units. Two AVIRIS scenes and the soil map were georegistered to the topographic map using distributed ground control points around the area. Associated attribute data were entered in the GIS database. The extraction of the AVIRIS pixel spectra was done using GRASS and the Spectral Image Processing System (SIPS, U. Colorado).
 

Data sets used in the analysis

The laboratory data were analyzed using 224 bands averaged to simulate the response function of AVIRIS (390 nm to 2498 nm). Out of the 224 bands, eleven bands were eliminated in the area between 850 nm to 900 nm. The analysis model consisted of 138 soils samples, 1 dependent variable (soil class) and 214 independent variables (spectral bands).

The second data set consists of 83 individual pixel spectra selected from the AVIRIS scene. Spectra were extracted from bare fields. The extracted pixels were from areas belonging to the same soil units as the soil samples. This data set has the same variables as the soil samples, except where absorption of atmospheric water occurs. The interval between 1342.72 - 1452.17 nm and between 1799.29 - 1953.26 nm were not used. Also the following single wavelengths were eliminated because they were noisy: 390.23 nm, 400.02 nm, 680.91 nm, 1273.00 nm, 1282.55 nm, 1282.95 nm, 2489.11 nm, and 2498.96 nm. The total number of bands used in this data set was 193.

Statistical Analyses

The objectives of these analyses were first, to evaluate the ability of AVIRIS to separate soils with similar chemical and physical properties, and second, to analyze the correlation between the bands that best discriminate soils in both AVIRIS and laboratory data sets. One problem that arises when analyzing high spectral resolution data is that some variables may be highly correlated. When the number of important spectral features is smaller than the number of bands, a number of bands can be eliminated without significant loss of information. We examined these issues in our analysis.

The statistical analysis was divided into two parts, first, reducing the number of variables and second, evaluating the discrimination power of the remaining variables and relationships between the old and the new variables. The correlation between the distributions of AVIRIS and laboratory data was also analyzed. The Statistical Analysis System (SAS) was used for all analyses.

To reduce the dimensionality of the data, two methods were used: Principal Component Analysis (PCA) and Stepwise Discriminant Analysis (SDA). PCA transforms the variables to a new set that describe the variability with fewer variables. The result is that the data set has been transformed from a number of p correlated spectral bands to a number of m uncorrelated variables where m < p, by linear combination of the original variables. SDA reduced the number of variables to those important in discrimination between soil types (it was assumed that the set of bands selected has a multivariate normal distribution). This method selects variables with the condition of minimizing the within-class variance and maximizing the between-class variance and identifies specific bands that are the best class predictors by stepwise selection. At each step, the variable that contributes least to the discriminatory power of the model, as measured by the Wilks' Lambda test, is removed, inclusion is based on the F-statistic derived from a sequential analysis of covariance. In the initial step, bands were partitioned into subsets of every fifth band with the SDA performed sequentially for five sets of 46 bands. This analysis method was chosen to ensure that no important information was lost because of an a priori elimination of bands. This division showed that SDA is a robust model, since selected bands were adjacent or nearly adjacent in each of the five band sets. The significant bands for each of the five sets were compiled and yielded a total of forty-six bands which showed the most significant discriminating power. SDA was applied to the AVIRIS data as well to determine the most significant bands and to eliminate redundancy. Similar bands were eliminated in both data sets.

Discriminant Analysis (DA) was used to test the performance of the selected bands for discriminating between soil types. These forty-six variables were first transformed by PCA in order to decrease the number of variables that enter the Discriminant Analysis while retaining the amount of variability explained. The first fifteen PCs were used as input variables in DA. Canonical Analysis (CA) was performed on the DA to maximize the separability between soil classes. The error estimate was inferred by contrasting two types of errors: resubstitution and cross-validation estimates. Although resubstitution yields better results, the cross-validation estimate is less biased and therefore potentially more realistic. When the number of samples is small, the resubstitution error is minimized, because it is easier to fit the discriminant functions. For this reason, although cross-validation error is greater, it was considered more reliable.

After studying the performance of the selected bands in discriminating soil types, it was important to evaluate how well the bands identified from the laboratory spectra are correlated to the bands selected from AVIRIS data. Canonical Correlation Analysis (CC) concentrates high dimensional relationships between sets of variables into a few pairs of canonical variables. It represents the strongest possible relation between a linear combination of one set of variables and a linear combination of the other data set. To find the relationship between both data sets, canonical correlation was applied between the two sets of bands from the lab and AVIRIS data. The purpose was to find out how much of the variance from AVIRIS data could be explained by the lab analysis and whether AVIRIS bands could explain as much of the variance a was accounted for in lab data. In order to do this analysis, three types of correlations were analyzed: the within-set correlation represents the correlation between the selected bands and the canonical variables of one data set, the correlation between the canonical variable of one data set and the selected bands of the other data set, and correlation between the canonical variables of the two data sets.

Results and Discussion

The major purpose of this paper was to evaluate AVIRIS bands in terms of soil discrimination to better understand how this information can be extracted from images. We evaluated this problem by developing models from laboratory based soil spectrometry and by comparing these results with those from AVIRIS pixel spectra following the same analysis. Our goal was to determine whether both data sets follow the same or similar patterns in terms of sample distribution and spectral separation between soil classes. The parameters compared were distances between classes, distribution of centroids, total variance of the data sets, variance between classes, and correlation between bands that are identified as important in the discrimination process. Figures 1 and 2 show an example of the spectra of the three soil types for each case.

The spectra from laboratory is characterized by distinct absorption bands at 1400 nm, 1900 nm and 2200 nm due to the presence of -OH either in the soil mineral (at 1400 nm and 2200 nm) or in the water molecule, adsorbed or bound (at 1900 nm). Also, a minor absorption band is present at 2300 nm although less clear, this absorption is due to the combination of -OH stretching mode with Mg-O-H bending mode, and it is present in trioctahedral clays (Pieters and Englert, 1993). Another broad absorption band is present at 1100 nm due to transitions in Fe2+.

In the visible region there is an accentuated decrease of reflectance below 0.6 nm due to a charge transfer absorption of iron in the violet area of the spectrum. The change of slope in this area has been traditionally attributed to differences in organic matter and iron oxide content (Stoner et al. 1981).

In the AVIRIS spectra it is not possible to observe the absorption features at 1400 nm and 1900 nm due to atmospheric water absorption. Nonetheless, the general shape of the curves from AVIRIS is similar to the shape laboratory spectra with the tendency to increase until 1900 nm. The positions of the major absorption bands are similar as well.

In AVIRIS data there is an absorption band at 960 nm that does not appear in laboratory data due to absorption of atmospheric water. The shape of the reflectance curve below 600 nm is similar to the shape observed in laboratory data. The band at 2200 nm is not observable in AVIRIS curves, but the trend toward decreased reflectance at longer wavelengths and the band at 2300 nm is distinguishable. Atmospheric water absorbs in the 1200 nm region and may cause the data to be noisy. Also, two atmospheric CO2 absorptions between 2000-2100 nm are observed.

SDA was performed on AVIRIS and laboratory data to reduce the number of bands to those most useful in discrimination between the soil groups. PCA was subsequently performed on the selected bands and tested using Canonical Discriminant Analysis (CDA). Although the PCs summarize the total variance in the data, we were interested in evaluating how well they perform in explaining the variance between soil types. Since the X2 test of homogeneity of the three covariance matrices was not significant, the within-group covariance matrix was used for the analysis in both data sets.

Because there are three soil types, two discriminant functions explain the variance between soil classes. In both data sets, the first discriminant function explained more variance than the second, indicating that two groups are more similar to each other than either are to the third (Table 1).

The multiple correlation between groups was evaluated for significance using the F-statistic. In the lab data the F-statistic was significant for both discriminant functions at a probability of p>0.0001, showing that the group centroids differ significantly. AVIRIS showed the same pattern, with the F-statistic highly significant at p>0.0001.

The index of concentration explains how much of the variance among centroids is explained by each of the discriminant functions. In the laboratory data the index of concentration shows that 62% of the total variance among the three centroids is accounted for by the first discriminant function. The second discriminant function accounts for 38% of the variance of the data set. The value of the index of concentration indicates that the three centroids are spread along two dimensions supporting the need for two discriminant functions. From the values of the squared canonical correlation it can be inferred that 55% of the total variance is associated with the first discriminant function, and 44 % is associated with the second discriminant function. The canonical correlation places the concentration statistics in context. The index of concentration indicates that the centroids of the classes are well differentiated while the value of the canonical correlation indicates that most of the variation in scores occurs between soil classes. In the case of AVIRIS data, 66% of the variance among the three centroids is associated with the first discriminant function while only 34% is associated with the second discriminant function. In this case the squared canonical correlation shown that 63% of the total variance is explained by the first discriminant function, and 46% by the second discriminant function. These results support the conclusion that the same patterns are present in both data sets.

Figures 3 and 4 show plots of the canonical variables calculated in CDA. The three classes are well separated in the lab spectra, with the Yolo loam soil located farthest away from the other soils (Figure 3). This pattern of distribution also occurs in AVIRIS data (Figure 4), although there, Brentwood clay loam and Yolo silty clay loam are less well separated than in lab data. Nonetheless, the distributions are consistent in both data sets with Yolo silty clay loam and Brentwood clay loam located closer to each other and farther from Yolo loam.

To test whether Yolo loam is the best differentiated soil type, we computed the Mahalanobis distances between soil classes (Table 2). Mahalanobis distances are based on the centroids of the classes as well as on their covariance matrices. Since the covariance matrices of the soil classes are different, the distances between soil types are not reflexive. To compare the distances (d) we examined the inequalities between classes: where d(1|4) < d(1|3) and d(4|1) < d(4|3), and where d(i|j) is the Mahalanobis distance between class i and class j, being defined as:
 

The values of the between class distances are shown in Table 2. Both data sets exhibit the same distance patterns between classes. In both cases, Mahalanobis distances between classes 1 and 4 and between 4 and 1 are less than between 1 and 3 and 4 and 3. These results are consistent with the earlier results which showed that one soil class is better discriminated than the other two; in all cases Yolo loam is the soil type having greater separation distance when compared to the other soil types.

For lab data, the cross-validation error was 16% while resubstitution was 5%. The percentage error for the AVIRIS data set was 19% for the cross-validation estimates. The resubstitution error for these data was 0%. Although the resubstitution error is most commonly reported, we consider it important to include both. When sample sizes are small it is easier to fit the discriminant functions so the resubstitution error may be unrealistically low. Because of this we consider cross-validation to be less biased, however, both types of error are expected to be relatively high. Variation in these alluvial soils is nearly continuous between soil phases, therefore many samples can have mixed properties that are characteristic of two or more soil classes, and as a consequence their spectra may show intermediates features. In both data sets the percentages of misclassification errors are different for the three soil classes. It is higher between classes 1 and 4. Classification errors between Brentwood clay loam (1) and Yolo silty clay (4) may be due to the similar particle size distribution between these soil phases. Classification errors are smaller between Brentwood clay loam and Yolo loam (3), where greater differences in particle size distribution are found.

Another way to examine the relationships between data sets is to examine the shape of the PCA eigenvectors and the correlation between the PCs from AVIRIS and laboratory data. Although the PCs summarize the total variance of the data sets, we applied SDA to evaluate which PCs were more significant in explaining the between-class variance and to study the relationship between the PCs of the two data sets. PC1 of the AVIRIS data was the most significant in explaining variance between soil types while in the lab data, PC3 and PC5 showed greatest discriminating power. PC3 is opposite in shape to PC5, however, the same wavelength region contributes to the identification of these soils but with different signs. In AVIRIS data, the highest partial R2 value corresponded to PC1 while in lab data it corresponds to PC3. Table 3 shows the results of these analyses for the PCs.

The first PC explained most of the variance in both data sets, however, it was not the one with the highest explanatory power between the groups in laboratory data. The weighting function in PC3 from lab data shows the same general shape to the one corresponding to AVIRIS PC3 but with opposite sign. This shows that similar wavelengths are important in explaining the between-groups variance for both data sets.

Some of the bands selected in the two data sets were the same: 547.59 nm, 567.38 nm, 676.57 nm, 764.01 nm, 1138.84 nm, 1953.26 nm, 1196.36 nm, 1244.26 nm, 2003.19 nm and 2092.92 nm. These bands were not selected in isolation and in all cases nearby or adjacent bands were also selected. This pattern supports the view that there are specific regions of the spectrum that contain important features for the classification. The selected bands are shown in the Figures 5, 6, and 7, marked with the (') and (g) symbols for AVIRIS and lab data respectively.

There are some important spectral regions in the discrimination process; close bands appear in both data sets although they are slightly shifted. The first area corresponds to the wavelengths around 600 nm where a high number of bands were selected. This is an inflection point of the curve, the bands chosen are close to the change in slope of the curve across the visible a pattern that has been shown to be related to the organic content of the soil (Stoner and Baumgardner, 1981). Among these, bands 567.38 nm and 676.57 nm were selected in both data sets. Krishnan et al. (1980) showed that 564.4 nm and 623.6 nm were optimal wavelengths for predicting organic matter. Thus, this pattern may be attributed to differences in organic matter content among these soils. Of the common bands between the data sets, there is also a band selected at 764 nm that is important in explaining the total variance. An absorption band located close to 700 nm has been reported to be related to electronic transitions of ferric iron (Irons et al., 1989). Although we do not have data about iron oxide content our results support the importance of these bands for explaining the soil variance. Another group of bands were selected in lab data around 1100 nm (Pieters and Englert, 1993) and also an AVIRIS band at this wavelength was selected. These bands could be related to iron oxide in the ferrous form. A band at 1200 nm can be related to the water absorption band although in AVIRIS this band region is noisy. In laboratory data, additional bands were highly significant around 1900 nm maybe related to water adsorbed in montmorillonite clay. The different amounts of montmorillonite clay in the soils explain the contribution of these bands for discriminating among them. Nonetheless, these bands could not be used in AVIRIS data because of the atmospheric absorption of water. Some bands were selected between 2100 nm and 2200 nm. Absorption features at 2100-2200 nm relate to the combination of -OH stretching and AL-O-H bending modes, which are characteristic of dioctahedral clays. The inclusion of these bands seems also to be related to the different clay contents in these soils. In AVIRIS data there was a group of bands around 2300 nm that seem to be important, this selection can be related to the presence of Mg-O-H however, since these bands were not selected in lab data, they may also represent the presence of some dry vegetation in the AVIRIS pixels. These bands have been related to absorption bands from cellulose and lignin (Elvidge 1990).

After extracting bands that were selected in both data sets, canonical correlation was run to compare the two groups of remaining bands. The purpose was to determine how much of the variance in AVIRIS data could be explained by the bands selected in the laboratory analysis and whether AVIRIS bands could explain as much of the variance as was explained in the lab data. The correlations between the canonical variables from the lab and from AVIRIS data were always higher than 80%. Also within-set correlations were higher than 80%. In lab spectra there were 15 canonical correlations that were significant at p> 0.0001 but only the first canonical variate showed high within-set correlations, which explained 91% of variance in the original bands. Also, this canonical variate showed 90% correlation with the original bands selected from AVIRIS. In terms of the total laboratory variance, 92% was explained by the canonical variate calculated from AVIRIS data.

For the AVIRIS data set, both, first and second canonical variables were considered important since they showed high within-set correlations with all the original bands. The first canonical variable could explain 64% of the variance in the lab data, and the second could explain 32% of this variance. The presence of two significant canonical variates in the AVIRIS data may have been due to additional sources of variance which were absent from the soil spectra, such as atmospheric conditions and mixed pixels that included plant material.

This analysis showed that 89% of the variance present in AVIRIS bands could be explained by laboratory bands.

A subset of the soils were analyzed to determine organic matter content and particle size distribution (Table 4). Thirty seven samples were analyzed, the results support the SCS descriptions that Yolo loam has the highest sand content, while the Brentwood clay loam shows the opposite trend, however, these differences are not very large. Brentwood clay loam has the highest organic matter content which agrees with SCS descriptions. It can be seen that Brentwood clay loam and Yolo silty clay loam show more similar particle size distributions and the Brentwood clay loam has higher organic matter content than the Yolo soils. Figure 8 shows a plot of organic matter content against sand content. It can be seen that Brentwood clay loam and Yolo loam are well separated, but Yolo silty clay loam is intermediate between the other two classes. These differences may explain the physical basis for the discriminant functions. The first discriminant function separates soil types based on particle size distribution while the second discriminant function separates the organic matter content. Yolo loam which has the coarsest particle size separates from the other soils based on the first discriminant function and Brentwood clay loam separates from the Yolo silty clay loam in the second discriminant function due to its high organic matter content. It seems that particle size distribution produces more variability than organic matter content since a higher number of errors occurs between classes with similar particle size distributions.

Conclusions

The analyses reported here showed that the three fairly similar soil phases could be differentiated based on their spectral properties. In both, AVIRIS and laboratory measurements, soil groups differed significantly. High correlations were found between laboratory spectra of field collected soils and airborne imaging spectrometry (AVIRIS) data. Although the error estimates are relatively high, we demonstrated that statistical relationships between AVIRIS and laboratory data are markedly similar and that they follow the same patterns in terms of significant wavelengths, distributions, and distances between classes. Optimum spectral discrimination only required a few bands. A high proportion (92%) of the variance in lab spectral data could be explained by AVIRIS bands, showing that this sensor could be used to identify soil variability comparable to laboratory analyses.

We found that it was very useful to apply PCA after SDA because the variability due to the differences in classes could be condensed in a few variables. This method could be used to derive soil surface maps by using the discriminant functions to classify pixels into different types of soils, however due to the continuous variation of soils, other methods such as Spectral Mixture Analysis would be more appropriate. In this study multivariate statistical methods were useful to demonstrate the relationships between AVIRIS and laboratory data.

It must be recognized that although soil maps are drawn with distinct boundaries there are actually gradual transitions in many soil properties over the landscape. Also, it is important to remember that soils do not occur in isolation; there are almost always more than one soil type within the described SCS unit. Some of the classification errors we found may be due to this fact. Since the spatial relationships are lost when working with spectra extracted from the image the soil differences may be more striking when the image is analyzed. Another potentially important factor not considered in this study is that the soil samples were not collected coincident with the AVIRIS flight. Although soils properties do not demonstrate rapid changes, in this case, cultural practices do change some properties (e.g., the amount of surface litter or water content of the soil), and we did not have this kind of information on the date when AVIRIS flew. Nonetheless, the results of this research support the idea that it is feasible to use AVIRIS images to discriminate between similar types of soils, and therefore it could be used efficiently to monitor soil surface changes in soil properties due to land use or other processes. Such measurements may be relevant for mapping soils with a satellite imaging sensor like the LEWIS HSI satellite to be launched by NASA in 1996.
 

Acknowledgments:

This work was supported by the Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA) graduate fellowship and by NASA EOS grants NAS5-31714, NAS5-31359, and SIR-C grant #958445-NAS7-918, and the Digital Equipment Corporation for DEC Alpha 3000 computers through the Sequoia 2000 grant Cooperative Research Agreement #1243. We wish to thank Dr. Michael Singer for the use of his lab for processing soil samples, and the DANR laboratory where the analyses were done, to Quinn Hart for assistance in image processing, and Jorge Pinzón and Neil Willits (UCD Statistics Lab) for statistical advice. We thank to Dr. Dar Roberts for calibrating the AVIRIS image to surface reflectance.

References

Agbu, P.A., Feherenbacker, D.F., and Jansen, I.J. (1990), Soil property relationships with SPOT satellite digital data in East Central Illinois, Soil Sci. Soc. Am. J. 54:807-812.

Baumgardner, M.F., Silva, L. F., Biehl, L.L., and Stoner, E.R. (1985), Reflectance properties of soils, Advances in Agronomy 38:2-39.

Basedow, R.W. (1995), HYDICE system: Implementation and performance, Proc SPIE, 2480:258-267.

Coleman, T.L., Agbu, P.A., and Montgomery, O.L. (1993), Spectral differentiation of surface soils and soil properties: is it possible from space platforms?, Soil Sci. 155:283-293.

Condit, H.R. (1970), The spectral reflectance of american soils, Photogramm. Eng. 36:955-966.

Csillag, F., Pasztor, L., and Biehl L.L. (1993), Spectral band selection for the characterization of salinity status of soils, Remote Sens. Environ. 43:231-242.

De Jong, S.M. (1992), The analysis of spectroscopical data to map soil types and soil crusts of Mediterranean eroded soils, Soil Technology 5:199-211.

De Long, R.K., Romesser, T.E., Marmo, J., and Folkman, M.A. (1995), Airborne and satellite imaging spectrometer development at TRW, Proc. SPIE 2480:287-294.

Efron, B. (1975), The Efficiency of logistic regression compared to normal discriminant function analysis, Journal of the American Statistical Association 70: 892-88.

Elvidge, C.D. (1990), Visible and near-infrared reflectance characteristics of dry plant materials, Int. J. Remote Sens. 11: 1775-1995.

Henderson, T.L., Baumgardner, M.F., Franzmeier, D.P., Stott, D.E., and Coster, D.C. (1992), High dimensional reflectance analysis of soil organic matter, Soil Sci. Soc. Am. J. 56:856-852.

Hosmer, D., and Lemeshow, S. (1989), Applied logistic regression, Wiley, New York.

Hunt, G.R., and Salisbury, J.W., (1970), Visible and near-infrared spectral of minerals and rocks: I silicate minerals, Modern Geology 1:283-300.

Irons, J.R., Weismiller, R.A., and Petersen., (1989) Soil reflectance, in Theory and Application of Optical Remote Sensing. Wiley Interscience, pp 67-105.

Krishnan, P., Alexander, J.D., Butler B.J., and Hummel J.W. (1980), Reflectance technique for predicting soil organic matter, Soil. Sci. Soc. Am. J. 44:1282-1285.

Lee, K., Lee, G.B.,and Tyler, E.J. (1988), Determination of soil characteristics from Thematic Mapper data of a cropped organic inorganic soil landscape, Soil Sci. Soc. Am. J. 52:1100-1104.

Lewis, D.T., Seevers, P.M., and Drew, J.V. (1975), Use of satellite imaginery to delineate soil associations in the Sand Hills region of Nebraska, Soil Sci. Soc. Am. Proc. 39:330-335.

May, G.A., and Petersen, G.V. (1975), Spectral signature selection for mapping unvegetated soils, Remote Sens. Environ. 4: 211-220.

Pieters, C.M., and Englert, P.A.J. (1993), Remote Geochemical Analysis: Elemental and Mineralogical Composition in Topics in Remote Sensing, Cambridge University Press. New York.

Roberts, D.A., Green, R.O., and Adams, J.B. (1995), Temporal and spatial pattern in vegetative and atmospheric properties using AVIRIS, Remote Sens Environ. (submitted).

Rao, C. R., (1973). Linear Statistical Inference and its Application, Second edition, Wiley, New York.

SAS user's guide, (1988), version 6.03, SAS Institute Inc, Cary, NC.

Seubert, C.E., Baumgardner, M.F., Weismiller, R.A., and Kirschner, R.A. (1979), Mapping and estimating areal extent of severely eroded soils of selected sites in northern Indiana, Machine Proc. Remote Sens. Data Symp. IEEE, pp234-238.

Smith, M.O.,Ustin, S.L., Adams, J.B. and Gillespie, A.R. (1990), Vegetation in deserts: I A regional measurement of abundance from multispectral images, Remote Sens Environ. 31: 1-26.

Stoner, E.R., M.F.Baumgardner, L.L., Bielh, and B.F., Robinson, (1980), Atlas of soil reflectance properties, Agri. Exp. Stat., Purdue Univ., West Lafayette, Indiana, Research bulletin pp1-75.

Su, H., Ransom, M. D., and Kanemasu, E. T. (1989), Detecting soil information on a native prairie using landsat TM and SPOT Satellite Data, Soil Sci. Soc. Am. J. 53:1479-1483.

Su, H., Kanemasu, E.T., and Ransom, M.D. (1990), Separability of soils in a tallgrass prairie using SPOT and DEM data, Remote Sens. Environ. 33: 157-163.

Thompson, D.R., Haas, R.H., and Milford, M.H. (1981), Evaluation of Landsat Multispectral Scanner data for mapping vegetated soil landscapes, Soil. Sci. Soc. Am. J. 45: 91-95.

Thompson, D.R., and Henderson, K.E. (1984), Detecting soils under cultural vegetation using digital Landsat Thematic Mapper Data, Soil Sci. Soc. Am. J. 48:1316-1319.

Weismiller, R.A., Persinger, I.D., and. Montgomery, O.L. (1977) Soil inventory from digital analysis of satellite scanner and topographic data, Soil Sci. Soc. Am. J. 41:1166-1170.

Wright, G.C., and Birnie, R.V. (1986), Detection of surface soil variations using high resolution satellites data: results from the U.K. SPOT-simulation investigation, Int. J. Remote Sens. 4: 757-766.

Zhang, R., Warrick, A.W., and Myers, D.E. (1992), Improvement of the prediction of soil particle size fractions using spectral properties , Geoderma 52:223-234.

1998, Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis