Investigation of Leaf Biochemistry by Hierarchical  Foreground/Background Analysis

Jorge E. Pinzon1, S.L. Ustin2, C. M. Castaneda2, and M. O. Smith3
1Dept. of Applied Mathematics, University of California, Davis, CA, 95616
2Dept. of Land, Air, and Water Resources, University of California Davis, CA, 95616
3Dept. of Geological Sciences, University of Washington, Seattle, WA, 94805
IEEE Transactions on Geoscience and Remote Sensing 36:1-15
 
Author for Correspondence:
Jorge E. Pinzon
Department of Applied Mathematics
University of California
Davis, CA 95616
Phone:  (530) 752-5092
FAX:  (530) 752-5262
email:  jepinzon@ucdavis.edu

Abstract

A hierarchical procedure was developed for quantitative estimation of foliar chemistry from remote reflectance spectra. We based our analysis on a new methodology termed Hierarchical Foreground and Background Analysis (HFBA) that derives sequentially a series of weighting vectors that simultaneously extract important discriminant features, in this case leaf anatomy and chemical concentration at different levels of detection from the spectral information. In this study we focused on the application of detecting carbon, cellulose, nitrogen concentrations, and water content. The goal of the derived vectors is two-fold: 1) get a robust detection and classification system of constituent materials, and 2) a good information packing system that minimizes extraneous undesired interference, like noise, in the analysis. In our study, two data sets were examined: A fresh leaf data set, LOPEx (from Joint Research Center (JRC), Ispra, Italy) and a dry leaf data set, Blackhawk Island (University of New Hampshire (UNH)). We tested the robustness of the derived vectors with four other data sets: fresh leaf data from Jasper Ridge (chemistry from UNH, spectra from University of California, Davis), Santa Monica Mountains (from University of California, Davis), and dry leaf data from two ACCP sites (University of New Hampshire (UNH)): Howland, Main and Harvard Forest. The results support the robustness of the HFBA system and demonstrate an advantage in classification accuracy (first level), and in predicting the biochemical composition (subsequent levels) over classical forms of analysis that ignore effects of the non-linear variation that contribute to reflectance at different (sub-pixel and spectral) scales. HFBA primarily deals with the spectral scaling issue.

I. INTRODUCTION

The quantitative estimation of the biochemical concentration of plant canopies from remote spectral measurements has been considered a challenge in recent research [1], and remains a goal for understanding of terrestrial ecosystem function. Lignin, proteins, nitrogen, cellulose, starch, chlorophyll and water are among the biochemical constituents of interest. The major physiological processes acting in terrestrial ecosystems, like photosynthesis, transpiration, and respiration can be related to these constituents [2, 3]. For example, Parton et al. [4], Pastor et al. [5], Melillo et al. [6], and Schimel [7], have argued that the inverse relationship between foliar lignin concentration and annual nitrogen mineralization in the soil is a key input into forest decomposition models due to its regulatory role in soil biochemical cycles. Wessman et al.[8], showed the potential for using hyperspectral remote sensing to assess some of these ecosystem constituents.

Previous research has attempted to categorize vegetation by predicting these constituents and the physical characteristics of canopies using spectral measurements [9, 10, 11, 1]. Near Infrared Spectrometry (NIRS) protocols have been developed for determining biochemical composition from dried ground agricultural forage using stepwise multiple linear regression (SMLR) [12, 13]. These methods have been extended to remote sensing studies for detection of biochemistry [14, 15, 8, 16, 17, 18, 19]. However, inconsistencies in predictions among these studies have raised several concerns. The main criticism is their assumption of linear relationships between leaf biochemistry and leaf reflectance, ignoring the fact that spectral identification and spectral detectability depend strongly on the context. That is, on anatomy and chemical composition of the leaf and on the conditions under which the measurements are made. At the image level, spectral contrast varies with the image context and affects detectability [20]. The SMLR might be adjusted to reduce undesired and interfering signatures for good local predictions but still produce poor performance for general purposes. Grossman et al. [21], and Curran [9, 10] have reported that the SMLR is subject to several types of errors. From investigations of SMLRs which had good r2 statistics but poor predictions they showed that the technique is sensitive to numerical errors. They also report that wavebands selected by SMLR in these studies vary with data sets or conditions. In some cases, other biochemical constituents partially covarying with the chemistry of interest have confounded interpretations because of absorptions in the same wavebands [1]. Also if the range of chemistry concentrations exceeds the SMLR calibration range, then the predictions will be in error [1]. In still other cases, the concentration of the biochemicals are small (e.g., nitrogen or starch) or express little variation among samples (e.g., total carbon content) [22]. Furthermore, Grossman et al. [21] and Jacquemoud et al. [22] observed that the relationships were strongly dependent on the mode of expressing reflectance and whether the chemistry was expressed on a weight (g/g) or area basis (g/m2).

The plant leaf or canopy is composed of a range of biochemical constituents, which are similar in composition due to a shared primary metabolism but varying in their proportions; the spectra of leaves or canopies represent that mixture. The main statistical tests to identify targets and classify (spectrally) mixed pixels have been provided by performing various multivariate linear comparisons. First, it is customary to use principal component analysis (PCA) to describe the channel-to-channel variance in multispectral data. However, with hundreds of channels (e.g., with hyperspectral sensors), PCA may have several potential numerical problems due to the relatively high dimensionality that may result in singular PCA systems [23] and also they are difficult to interpret because the orthogonal axes of statistical variance do not have consistent and simple equivalence to field and laboratory observations. It is desirable, instead, to classify images within the conventional frame of reference of field and laboratory observations with methods that avoid intrinsic singular problems. In this respect, spectral mixture analysis (SMA) has become a well-established procedure for analyzing imaging spectrometry data [24, 25, 26, 27, 28].

SMA is a structured and integrated framework that simultaneously addresses the mixed-pixel problem, calibration, and variations in lighting geometry and displays the results in terms of proportions of endmembers that can be related easily to standard ecological observational units (e.g., cover). The general form of the SMA equation for each band is expressed as:
 
(1)
where Rb is the radiance at band b, Fem is the fraction coefficient of each endmember Rem weighting their radiance at band b, and Eb is an error term accounting for the unmodeled radiance in band b. Endmembers are chosen to explain the spectrally distinct materials that form the convex hull of the spectral volume. This approach works best when describing a few spectral types that, in various mixtures, can account for most of the variance in an image data set. It does not mean, however, that it is possible to identify any specific material. SMA works less well when the spectral features of interest are minor components of the total variance. In fact, SMA has the disadvantage, at least for this application, of approximating linearly the natural (non-linear) complexity of materials represented by the mixture of endmembers. This produces a non-unique mixing model to identify and quantify materials that occur at the sub-pixel scale [20]. Moreover, mixtures of materials may mask the absorption bands needed for unique identification of particular biochemicals. In summary, the technique is relatively insensitive to subtle absorption features, and produces significant quantification errors due to endmember variability from linear and nonlinear mixtures (e.g. from scattering, and lighting geometry) in a pixel. Therefore, minor sources of spectral variation (e.g., discriminating stressed from unstressed vegetation and variations in canopy chemistry) can hardly be detected by SMA [29]. Aber et al. [30] explicitly evaluated the use of SMA and endmember spectra to estimate biochemical composition of foliage samples from northeastern U.S. forest species and concluded that this methodology was inappropriate for obtaining quantitative biochemical estimates. This conclusion was reaffirmed in the ACCP report [1]. A methodology that could minimize this undesired variation by projecting it onto a plane where it clusters into a common point normal to the plane of view is desired.

Boardman [31], used a geometric approach based on the convex hull of the spectra projected into the mixing space to find a solution that minimized spectral variation for some features while accentuating others. His technique is still a SMA approach that automatically derives the number of endmembers and estimates their pure spectral composition [31], but it is suboptimal in the presence of multiple mixing. More recently, Harsanyi and Chang [32] developed a mixture technique that rejects undesired interference by performing an orthogonal subspace projection (OSP). This technique simultaneously reduces data volume and emphasizes the presence of a signature of interest. Bolster et al. [33] seeking the same goals, instead use the first difference partial least squares regression (PLS) that is based on a singular value decomposition (SVD) of the whole spectrum data set. SVD reduces noise-related interference, common in a first difference analysis, and reduces the analysis into a smaller set of independent variables. Both, OSP and PLS, achieve good performance in detecting material abundances at low levels for a particular scenario by incorporating the variability of the material abundance into the more important independent variables (factors) but they are unable to extend the application to other scenarios. Existing approaches present a common problem in lack of robustness. In this respect, we lack systematic means for quantifying vegetation from spectral measurements (see ACCP [1] and Peterson and Hubbard [11] for references).

In order to develop a directed search methodology to locate the desired robustness (analytic) property, Smith et al. [29] proposed a revised SMA technique, that they termed Foreground/Background Analysis (FBA). Harsanyi's approach shares the properties of orthogonal space projection and a similar rationale with the FBA technique. In this technique, spectral measurements are divided in two groups of foreground and background spectra that comprise a selected subset of spectra which emphasizes the presence of a signature of interest. In defining both groups they do not include intermediate mixtures between foreground and background. In that way, FBA vectors should be sensitive to minor sources of foreground spectral variation and insensitive to background spectral variation. The goal of FBA is to project spectral variation along the most relevant axis of variance that maximizes the spectral differences between the foreground and background, while minimizing spectral variation within each group. Their FBA approach defines a weighting vector w = (w1, w2, ..., wNb), with components wb at each channel b = 1, ..., Nb, such that all foreground spectral vectors, Rf = (Rf,1, Rf,2, ..., Rf,Nb), are projected to 1 while background spectral vectors, Rb, to 0.This property is defined by the FBA system of equations:
 
(2)
where T provides a translation that is typically required to optimize the FBA system. As stated FBA is in essence another linear classifier of the spectra that can be applied to identify low and high material abundances. Pinzon et al. [34, 35] modified the FBA linear system to project a subset of spectra into relevant axis of continuous chemical variation. In this case, the system of equations is given by
 
(3)
The FBA system to be solved is composed only of the training (calibration) samples that reflect the distribution of the feature to be detected in the whole data set. When singular value decomposition is used to solve the FBA system, PLS used by Bolster et al. and FBA are equivalent. Pinzon et al. [35] found that the method presents good predictions and good r2 statistics, however, it was not robust in this form. In particular, the formal analysis breaks down when we use samples with different types of tissues or with concentrations beyond the range of variation in the training data set (sensitivity to calibration). This sensitivity depends on the method used, and moreover, it varies with each biochemical and the ranges of reliable values, which are typically different for species from functionally different ecosystems. While water and pigment concentrations are measured in a straightforward way, nitrogen, carbon, and cellulose are calculated by indirect methods that increase the level of uncertainty [30, 16, 36, 37]. Furthermore, chemistry concentrations do not vary independently and for example, Jacquemoud et al. [22] found for the JRC fresh leaf data set (FL) that nitrogen concentrations were significantly positive correlated with water content and negatively correlated with cellulose concentration.

In order to address these concerns, we have examined an alternative multivariate hierarchical FBA procedure (HFBA) that derives sequentially a series of FBA vectors extracting simultaneously important general (anatomical) features and discriminates samples at different levels of chemical detection. In this case, the right hand side of Equation 3 is changed to pi, where pi, i = 1, ..., n, represents either anatomical features or a quantized range of chemical concentration. Each FBA subsystem differs from Bolster's approach in the use of quantized chemistry ranges and reflectance itself in the equation, thereby reducing calibration dependencies and extraneous noise interference. By solving each FBA subsystem with a SVD, yields vectors that pack efficiently the spectral information that highlights those desirable features, and can potentially be extended to other sites.

In this paper, we apply this approach to the prediction of leaf chemical constituents specifically: carbon, nitrogen, cellulose concentrations and water content. In the next sections of the paper, we describe the data sets under study (section 2), introduce the underpinning concepts behind the HFBA system (section 3), show the results of the experimental tests and their application to other sites (section 4) and finally (section 5), we give some concluding remarks.

II. DATA SET AND METHODS

The study used two primary data sets to develop the methodology and four secondary data sets to test the robustness of the method. Three of them are measurements of dry leaf spectra and the others three are measurements of fresh leaf spectra. The first data set was obtained from NASA's ACCP experiment which consisted in paired foliar reflectance spectra and chemistry measured from three sites. These data sets contained 558 dry ground foliar samples from tree species which are canopy dominants in eastern United States mixed hardwood forests and represent species having a generally convergent set of foliar adaptations and conditions. We consider the Blackhawk Island data (ACCP-BH), as primary training set. This site is located at south-central Wisconsin (43o 40' N and 89o 45' W) and had 182 foliar samples from mixed deciduous and conifer species, primarily red and white oak, sugar maple, white pine, and basswood. The data from the other two ACCP sites were used to test the robustness of the method. These data were from the Harvard Forest (42o 32' N and 72o 11' W), which consisted of 188 samples (ACCP-HF) primarily from red and sugar maple, red oak, spruce, birch, black cherry, hemlock, ash, American beech, and larch. The third site is from the Howland, Maine (ACCP-HO) located at (45o 12' N and 68o 44' W), which consisted of 188 samples that include red, white and sugar maple, hemlock, aspen, spruce, birch, ash, American beech, and larch. Biochemical analyses were performed at the U. New Hampshire. Descriptions of the data sets and measurement conditions can be found in Bolster et al. [33, 1].

The second primary data set used in this study was obtained from the LOPEx (Leaf Optical Properties Experiment) done by the Joint Research Center (JRC), Ispra, Italy. The data set was collected in the vicinity of the center (45o 58' N and 8o 38.6' E) during the summer of 1993 and is described more fully in [37]. Data included 63 fresh leaf foliar samples (LOPEx-JRC) from a diverse range of 37 species of herbaceous and woody dicots and 9 species of monocots, both cultivated and native. Species include maple, alder, birch, laurel, walnut, chestnut, corn, alfalfa, figs, beets, lettuce, tomato, iris, and bamboo. Biochemical analyses were performed at the Centre de Recherches Agronomiques, Libramont, Belgium, as described in Jacquemoud et al. 1994 [37].

A third, similar but smaller secondary data set, was obtained from the Jasper Ridge Biological Preserve (JRBP) at Stanford University, California (37o 24' N and 122o 13' 30” W). It included 17 fresh leaf samples representing 14 native herbaceous and wood dicot species and one monocot, which were collected in the summer of 1992 and spring 1993 from several coastal California plant communities at the JR. Species include maple, buckeye, toyon, and oaks. Biochemical analyses were performed by the U. New Hampshire following ACCP protocols, and spectral measurements by U. California, Davis.

Lastly, another secondary fresh leaf data set that was obtained from the Santa Monica Mountains (SMM) in southern California (34o 2' N and 118o 30.5' W) was used to test model robustness and predictions [38]. Species at this site consisted of several chaparral shrubs and represent even more sclerophytic foliar conditions than the JR data set. These data consist of 42 foliar samples from 8 species of common dicot shrubs. Species include manzanita, chamise, sage, laurel, and California lilic. Only water content, which was measured as the difference between fresh and dry leaf weights is reported here.

The ACCP studies used dry ground leaves which were measured in a NIRS Model 6500 (NIRSystems Inc., Silver Spring, MD) Near Infrared Spectrophotometer that provides a wavelength range of 400-2500nm. Chemistry for nitrogen, cellulose, and total carbon were analyzed following methods described in [16, 36, 39]. LOPEx reflectance spectra were made on individual fresh leaves for the whole data set and 48 dry ground leaves, which were measured in a Perkin Elmer Lambda 19 spectrophotometer (Norwalk, CT), equipped with an integrating sphere. Spectral resolution ranged from 1-2 nm in the visible and NIR to 4-5 nm in the middle or SWIR obtaining a wavelength range of 400-2500nm. Jasper Ridge spectra were obtained on fresh individual leaves which were measured in a model NIRS Model 6500 (NIRSystems Inc., Silver Spring, MD). The NIRS system gives a 2 nm wavelength interval and a full width-half maximum slit width of 10 nm between 400 and 2490 nm. Measurement characteristics are described in more detail in [40]. The SMM data were measured on a Varian Cary 5E spectrophotometer (Sunnyvale, CA), with an integrating sphere (Labsphere, North Sutton, NH), which has variable wavelength resolution ranging from less than 1 nm in the visible to about 4-5 nm at 2500 nm. According to their size and shape, two to five leaves were cut and combined along their edges to ensure that all the light interacted the leaves [41]. The spectra were convolved to get 10nm wavelength intervals for future application of the HFBA vectors to Advance Visible Infrared Imaging Spectrometer (AVIRIS) images for a total of 211 wavebands in the range of 400-2500nm.

Tables 1 and 2 show a summary of chemical concentrations for fresh and dry ground leaves, respectively, for both training and testing data sets. We provide mean, standard deviation, and ranges of measured values, expressed on a percent dry weight basis (g/g). Same statistics are presented for water content.

Carbon concentration is the most homogeneous between the sites. However, JRC carbon data has a standard deviation of almost 1:5 times the standard deviation of the ACCP and JR data sets. The higher means and standard deviations of nitrogen concentrations in the JRC data set were due to the large number of cultivated plants, which typically have values near or greater than 3%. Significantly different ranges of chemical values are noticeable among data sets for cellulose and nitrogen. However, ranges of nitrogen concentration in the calibration data set (JRC and BH sites) essentially encompass those in the validation data set (JR and HF-HO sites). That is not the case for cellulose concentration: we notice significantly higher values in the JR dicot samples with respect to JRC samples, possibly due to the larger number of cultivated plants in JRC since cellulose and nitrogen were found significantly negative correlated [22]. Also cellulose had low values in HF-HO with respect to BH, this was mainly due to the high number of conifers, especially hemlock found in HO which are absent in the BH data set.

We exploit the species specific properties of the leaf spectra in order to find HFBA vectors (to be defined below) that discriminate the spectra into different groups and ranges of chemical variation. Chemical values are quantized into discrete ranges (low, intermediate and high concentrations) depending on their variance in order to reduce the sensitivity to calibration that all regression techniques have. We have also normalized each spectra in order to reduce dependencies on the conditions under which the measurements are made. The normalization is done by dividing each spectra by its respective Euclidean norm, that is, the normalized reflectance at channel i, which is given by

This normalization transforms each spectra to a common framework.

In the next section we analyze the properties of HFBA vectors.

III.  THE ANALYSIS

The main purpose of HFBA is to highlight subtle absorption features that can be directly related to a particular desired property, e.g. biochemical variation. These absorption bands represent minor sources of the spectral variation and may be masked by other stronger spectral features, like species anatomical properties and water absorption bands. A way to identify such bands, in the HFBA system, is by clustering samples with similar strong spectral features and then work independently over each group.

In the HFBA system, at the first level, we spectrally discriminate samples in the study into different categories given by a species classification or relevant chemistry variation: low, intermediate and high concentrations. Random selection of samples for the training subset usually have good results, however, robustness is not guaranteed. By selecting samples in the HFBA system to preserve the distribution observed in the property of interest for the whole data set, we can emphasize their general features. In that way, any sample that does not meet the first level of classification could be considered as outliers and rejected from application of the subsequent levels; they should be beyond the expected range of biochemical variation. In subsequent levels, we concentrate the FBA vectors that best relates leaf reflectance to a specific quantized range of chemical concentration (based on the variance in the chemistry data). We sequentially go down in scale narrowing the range, to obtain a closer approximation. This methodology allows us to refine searching of (subtle) spectral features that are related to biochemical differences in low, medium, and high quantized ranges. Also, we can group samples having similar anatomical properties and chemical concentrations to evaluate the reliability of the FBA vectors. The hierarchical detection tree constructed in this way, piecewise linearizes the spectral non-linearities observed in the overall chemical variation. In each step, the variance associated with each biochemical is considered to be the required level of detection necessary to make a valid interpretation. We quantized the chemical variation into few discrete values (no more than five) at each level. There are two basic reasons for seeking such quantization. First, following this methodology, the variance of each chemical constituent can be optimized in sequence by obtaining the FBA w vectors from Equation 3, and we can also determine when we cannot extend the analysis to a further level. Ability to determine the noise floor is an element of supreme importance in applying this analysis to remotely sensed images. Second, for classification purposes the quantized chemical concentrations contain discriminant features that can be related to observed spectral variation. Therefore, by quantizing chemical concentrations, biochemical detection by HFBA is, in essence, a classification problem at each level: to discriminate samples by associating their projected spectra (via HFBA vectors) into discrete ranges of biochemical variation.

At each level the HFBA equation (Equation 3) is solved by using a singular value decomposition (SVD) of the reflectance matrix Rfb. SVD becomes attractive and relevant to questions involving the behavior of the reflectance matrix Rfb itself, or its pseudo-inverse because it packs the spectral information into a few relevant axes of variation [42]. The power of the HFBA method becomes apparent as we begin to catalogue more precisely the performance of the SVD in information packing and avoidance of overfitting problems. For robustness, it means that we are training HFBA vectors with general but discriminating spectral properties. For more details about SVD see [42].

Determining how many abstract factors are needed in order to retain important information and avoid overfitting is a key step in any factor-based technique. The trick is to keep only factors that contain analytical information and discard factors that contain redundancies and noise. While keeping too many factors creates a dangerous tendency to overfit the data and adds undesired noise to the discriminating vectors, too few factors generate a poor set of discriminating vectors. Several indicator functions are available to aid in identifying the optimum number of factors (rank or dimensionality). We use Malinowski's methods to estimate the number of factors [43], and to determine the rank of the spectral system, although we use singular values instead of eigenvalues in our definitions. We must be careful when using these methods. First, not all linear problems are subject to such indicator functions, they are also limited to problems in which errors are not systematic or erratic. In our cases, spectral measurements contain relatively uniform errors throughout. Secondly, the dimensionality can not be replicated for all nodes of the hierarchical tree (levels of detection). At each node, we define a system of equations using the training set to define and highlight the desired properties. Therefore, Malinowski's indicators can be applied at each level in the analysis to determine the noise level. By adding to the solution system only singular vectors having the highest singular values and provide a satisfactory reproduction of the original data (within experimental error), the HFBA vector efficiently focuses the relevant spectral information that fits the variation of the desired property (species, chemical discrimination or other) at that level. The hierarchical procedure using this classification tree may account for the effects of non-linear chemistry dependencies on geometry and on anatomy. By the SVD, we also maximize the distinguishing characteristics of each site and relate directly the most common and relevant ranges of chemical concentrations to the spectral information.

In summary, HFBA methodology focuses in general but discriminating characteristics in the spectra that allow us to relate them to foliar chemistry variation at different levels of detection. Our principal assumption is that we can not detect all possible ranges of canopy or foliar biochemistry in one step. The need to break the chemistry variation into different levels of detection is due to the complex and nonlinear dependencies in the composition of a mixture. We do not see a mixture as a simple linear operator, but rather as a material that is composed of mixtures at different scales. The HFBA methodology is based on this concept of mixture: detecting accurate discriminating characteristics at each level allows us to reduce the range of variability in the next step. Poor discrimination at one step limits the level of detection that is possible with the HFBA methodology, which is bounded by the instrumental uncertainty, by nonlinear dependencies on geometry, and on the conditions under which the experiment was done.

For the fresh leaf data set, we have two distinctive features to use at the first level. One is the discrimination between dicot and monocot leaf samples due to their respective anatomical differences, and the other is a quantized water content due to the strong absorption features of water that could mask minor sources of spectral variation. To represent the dicot-monocot features we have selected 15 dicots and 5 monocots from JRC data set, which had 48 dicots and 15 monocots in total. To discriminate fresh leaf samples according to their water content (quantized into 2 main values: low-high water content), we trained the HFBA system at this level using 21 samples (17 low, 4 high) from JRC data set with a total of 53 and 10 (low-high, respectively) quantized water contents. The number of factors used for each classification was 8 for dicots-monocots and 7 for water quantization.

We considered the spectral discrimination of conifers and deciduous species as the first step of spectral biochemical detection in ACCP samples due to their anatomical and biochemical differences. Their differences in spectral properties were characterized using 13 conifers (from white and red pine samples) and 47 deciduous leaf samples taken from the BH data set which represented 40 conifer and 142 deciduous samples. In total, ACCP data set had 202 conifers and 356 deciduous samples. The number of factors derived from Malinowski's approach was 8. Here, a high level of information packing (compression) was obtained by SVD. It is worth noticing that hemlock and spruce species were present in HO and HF but not in BH.

After the primary classification step, we applied a second HFBA level that trained two vectors from the samples classified at the first level. As before, samples in the training set are selected to reflect the distribution of the quantized chemical concentration in the classified subset. Tables 3 and 4 present the five-centered distribution of quantized chemical values used to train the two HFBA vectors. Each column shows the number of samples in each site and the training set for each range. Dimension, shown in the top corner cell of each sub-table, represents the number of factors used after each SVD. Observe that the distribution of the full chemical data set is retained in the training set. In general, we used about 1/3 of the data set to train the vectors. The aforementioned out-of-range cellulose concentration for JR and HO-HF samples has been accounted by including 2 samples from JR and 40 samples from HO-HF (28-HO, 12-HF) in the training of cellulose HFBA vectors. Otherwise, the high (JR) or low (HO-HF) cellulose values, will be projected erroneously into intermediate ranges, R4 for JR and R2 for HO-HF. Nevertheless, the main features across low-medium-high ranges are still captured.

IV. RESULTS

Overall, spectral features extracted via HFBA vectors at the first level became good discriminators. Knowing this classification, it was possible to improve chemical detection by focusing spectral variation on the wavebands with highest HFBA coefficients. These wavebands contain most of the chemical variation and should relate to those absorption features of the constituent. Comparison between HFBA and PLS showed that HFBA presented a consistent improvement in fitting each constituent distribution for the whole data set (training and testing).
 

A. Species Classification: first level   Reflectance HFBA Vector

Figure 1 shows average spectra of conifer and deciduous samples + standard deviation, and how the HFBA classification vector weighs these spectral characteristics. While a more rapid increase in conifer spectra is observed between 700-800 nm region, deciduous spectra present a more extended increase in reflectance through 1000nm. The HFBA vector detects this difference and weights it accordingly for classification. Another discriminant feature detected by the HFBA vector is that deciduous spectra is 10% higher than conifer spectra between the main water absorption wavebands (at 1400 and 1900nm) and 3% between 2300-2500nm.

Table 5 presents the results of classification using these features. Ninety-nine percent of deciduous samples were correctly identified, however 39 of the conifers (20%) were misclassified, 9 of these were larch samples (deciduous-conifer) and 21 were hemlock species having low cellulose concentration. Considering only results for BH, a hundred percent of samples were correctly classified. The HFBA vector captures the spectral features that distinguish conifer from deciduous samples.

Similarly, Figure 2 shows the properties of the monocot-dicot and low-high classification vectors applied to 120 fresh leaf samples. Monocot and dicot samples are identified by their spectral features in the visible region, where monocots are brighter (Figure 2a). The highest HFBA coefficient is found around the red edge, which has been used as an estimator of chlorophyll content [44]. In fact, monocot JRC samples had higher means in chlorophyll a (39.3 g/cm2) and chlorophyll b (11.78 g/cm2) than dicot JRC samples (35.4 g/cm2 and 11.5 g/cm2, respectively) [37]. Low and high water contents are spectrally discriminated by the main water absorption features at 1400nm and 1900nm, and the way these features interact in the blue visible region (around 400nm), where spectra of low water content has lower reflectance than spectra of high water content.

Table 6 presents the results of monocots-dicots and low-high water content classification. Ninety-five percent of monocot and dicot samples and eighty-eight of low-high water content samples were correctly classified. In the classification of low-high water content we focused the HFBA vector on identifyng samples that belong in the range R1, given in Table 3. Samples in range R2, in the same table, were considered as having intermediate water content and 16 were correctly separated from R1 samples. However, the HFBA vector projected the remaining 11 misclassified R2 samples into the top of the range of R1 samples. The spectral properties of these 11 samples around 1400nm were very closed to the properties of R1 samples, e.g. higher reflectance or (Figure 2b).
 

B. Second level: prediction

After classification, we had correlated spectral measurements with concentration or content (water) in each group. The minor absorption features we can see in the 400-2400nm spectral region are relatively weak and broad. However, they seem to be the result of harmonic overtones and combinations of electron transitions of the stronger absorptions in chlorophyll and water.

Figure 3 presents the two vectors used after classification for predicting each biochemical in the ACCP data set. While vectors on the left-hand side correlate quantized chemical concentrations for samples classified as deciduous, vectors on the right hand side do so for conifer samples. Carbon vectors present similar features in deciduous and conifer samples with major peaks about 700nm and 1920nm, especially in conifer vectors. Vectors for cellulose show some differences: deciduous vector (left hand side) presents high coefficients starting about the red edge waveband, 700nm and about 2120nm, an O-H bond bending band features (see similar results in [9]). On the other hand, conifer vectors had broad chlorophyll features, another indication of a higher correlation in conifers between pigments and cellulose concentration. We also found for the conifer-cellulose vector a broader range of wavebands with high coefficients between 1700 and 1900nm, which Curran reported as strong absorption wavebands for cellulose between 1780-1820nm [9]. Nitrogen vectors present main peaks at different wavebands. Deciduous vector peaks start about 1400nm (a water absorption band) and the conifer vector near the red edge. In this respect, wavelengths needed for spectral nitrogen predictions depend on the species. A common secondary positive peak about 1900nm indicates a positive correlation between water and nitrogen, which probably dampens the weak nitrogen absorption in fresh leaf data.

Similarly, Figure 4 presents the two vectors for each biochemical in each classified group of fresh leaf samples (monocot-dicot vectors for carbon and cellulose and low-high vectors for nitrogen and water). Overall, wavebands around the red edge and the primary water absorption bands (1400nm, 1900nm) have the highest coefficients. However, in the monocot-cellulose vector (second row, left side) the red edge contributes the most, jointly with the near infrared region which is greatly influenced by cellular structure [45]. This vectors has only two more regions with high coefficients, about 1100nm and between 1300 and 1600nm with highest peak at 1380nm where a C-H stretching vibrations occur [11].

Although, nitrogen vectors are very similar in shape to those for water, their coefficients had moderate values at wavebands between 970nm and 1200nm (other water absorption bands). Instead the red edge waveband region is highlighted. Carbon vectors, as in ACCP samples, are qualitatively similar. Same wavebands are highlighted in both groups, indicating that carbon features are consistent between monocots and dicots. As expected, water vectors have high coefficients at major water absorption bands, 1400nm and 1900nm, especially when the vector is applied to the high water content group. The other water absorption bands around 970nm and 1200nm are detected only by the vector trained with high water content samples. The highest value in water vectors is found at 1900nm. This value is much higher for the high water content vector.
 

C. Statistical results

Figures 5 and 6 present the distribution of predicted and real nitrogen data and regression lines for HFBA and PLS applied to ACCP and FL data sets. Overall, the sequential application of HFBA vectors showed better approximation than PLS for predicting the distribution of nitrogen concentration in ACCP and FL data sets and better r2 statistics from 0.63 to 0.69 in ACCP data and 0.13 to 0.71 in FL. Further, the predicted values (dashed lines) (a) for HFBA are closer to measured distribution (solid lines) than observed for PLS method (b).

The expected dampening of the weak nitrogen absorptions in the fresh leaf data by the strong water absorptions, is the main cause for the poor approximation obtained by using PLS. HFBA worked better because, we have grouped samples by water content, reducing this effect.

Figure 7 shows same statistics for the application of HFBA and PLS vectors to the available 120 fresh leaf samples (63-JRC, 17-JR, 40-SMM). A clear shift is noticed in PLS results, mainly in the samples of the testing data set (JR-SMM). Three main ranges are identified: low (less than 2.2 g/dm2), intermediate 2.2-3.6), and high (greater than 3.6). HFBA detects better such ranges, mainly by identifying the low water content, higher in number (see histograms in Figure 7). Again, the predicted distribution of nitrogen in the HFBA samples are closer to the measured values than was found for PLS method.

Tables 7 and 8 present statistical results of HFBA and PLS predictions of carbon, cellulose, and nitrogen concentrations for ACCP and FL data sets, and water content for Fl data set. For a reconstruction of the distribution, both tables provide five columns that indicate the number of samples projected to the ranges defined in tables 3 and 4, ACCP and FL respectively. Also, for a quick statistical comparison mean, standard deviation, minimum maximum and r2 values of the prediction are given for each chemical. The tendency in PLS predictions is a clear shifting toward the ranges of the training set. This tendency is reduced in HFBA application. HFBA vectors, in this respect, provide more general characteristics, independent of the site, than PLS.

Overall, HFBA presented better results than PLS. For the whole ACCP data set the r2's were improved from 0.31 to 0.49 in carbon prediction, 0.28 to 0.51 in cellulose prediction, and 0.63 to 0.69 in nitrogen. HFBA also approximates better the distribution (ranges), and therefore, mean, minimum and maximum of each data set when considered separately. While the r2 statistics in cellulose indicates poor predictions, a more closed look to the ranges show a very good approximation of the distribution of the original data, compared with Table 3.

In the FL data set, HFBA also has better approximations than PLS. Coefficients of regression, r2, are improved from 0.15 to 0.28 in carbon, 0.01 to 0.52 in cellulose, 0.13 to 0.71 in nitrogen, and from 0.43 to 0.75 in water. Although, carbon has a poor  r2 for JRC data (0.22), the distribution of predicted and original data are very close, having R4 as the range with most samples. In general,  r2 statistics reflect how well the prediction fits the ranges with highest number of samples. That means, in the case of carbon, that the prediction is spread across the whole range of values. Therefore, we got a poor  r2.  Better results are obtained for cellulose. Mainly, because the vector for monocots weighs highly the near infrared region. The best results were obtained for nitrogen and water. The original data, as we mentioned in section 2, had high positive correlation between water and nitrogen. This aspect was captured by the nitrogen HFBA vectors (Figure 4), but not by the PLS vector. A clear shift for JR in nitrogen PLS predictions is noticed. This shift is most clear in the water content PLS predictions. Basically, shifting quantization ranges forces nitrogen shifting in high water content samples. HFBA controls better this relation and better preserves the original ranges.

V. DISCUSSION AND CONCLUSION

A new robust approach for the detection and classification of constituent materials was developed and tested. The technique uses an iterative hierarchical application of a modified FBA technique to detect each chemical content at different levels of accuracy by quantizing the chemical concentration. The power of the HFBA technique is based on the attractive properties of the SVD transform in information packing and avoidance of overfitting problems by minimizing extraneous noise in the analysis. The technique was applied and trained over two different scenarios and fresh and dry leaf data sets, and tested with four additional data sets (two used for testing dry leaf results and two for fresh leaf results). It is clear from the above experiments that the proposed approach is promising. Its sensitivity and its improvement in chemical detection over other techniques, in particular, PLS has been demonstrated.

By the iterative hierarchical procedure we force the system to account for important non-linear dependencies directly related to spectral scaling. In that respect, one of the strong points of the proposed method is that we can group together samples with similar anatomical properties manifested spectrally. However, if the distribution of these properties is continuous, samples near the boundaries of the discriminant regions could be misclassified weakening the helpfulness of the classification step. In particular, as spatial variation of vegetation is high, the selection of a training set that explains the mixing presented at different spatial scales is critical. This process seems to be a key factor for understanding the good performance of HFBA dealing with sub-pixel scaling issues in this application, although HFBA was not properly equipped to deal directly with these spatial issues. There are more appropriate image analysis methodologies concerning spatial scaling problems such as wavelet transforms. The wavelet decomposition will give a better representation of spatial distribution (at different scales) of the data, and especially a better description of the properties of samples near to discriminant boundaries. Clearly, these points have to be further investigated to identify the relationship between spatial-spectral scales. As a conclusion, we consider that a combination of HFBA and wavelets or other spatial scaling transforms has significant potential and certainly deserves further investigation.
 

ACKNOWLEDGMENTS

We are grateful to COLFUTURO, NASA Canopy Chemistry Program grant (NAS5-31714), EOS grant (NAS5-31359), NASA SIR-C program grant (NAS7-918), and NASA grant (NAGW- 4626-I) for providing the funding needed to make possible this work. We would also like to thank to the members of the Center for Spatial Technologies And Remote Sensing (CSTARS) at University of California, Davis, for their assistance in the collection and reduction of the data sets presented in this paper.

REFERENCES

[1] NASA, “Accelerated Canopy Chemistry Program Final Report,” final report, NASA-EOS- IWG, October 19 1994.

[2] S. L. Ustin, C. A. Wessman, B. Curtiss, E. Kasischke, J. B. Way, and V. C. Vanderbilt, ”Opportunities for using the EOS imaging spectrometers and synthetic aperture radar in ecological models,” Ecology, vol. 72, pp. 1934-1945, 1991.

[3] G. Asrar and J. Dozier, EOS: Science Strategy for the Earth Observing System 1994. Woodbury: American Institute of Physics, 1994.

[4] W. J. Parton, J. W. B. Stewart, and C. V. Cole, “Dynamics of C, N, P, and S in grassland soils: a model,” Biogeochemistry, vol. 5, pp. 109-132, 1988.

[5] J. Pastor, J. D. Aber, C. A. McClaugherty, and J. Melillo, “Aboveground production and N and P cycling along a nitrogen mineralization gradient on Blackhawk Island, Wisconsin,” Ecology, vol. 65, pp. 256-268, 1984.

[6] J. M. Melillo, J. D. Aber, and J. F. Muratore, “Nitrogen and lignin control of hardwood leaf litter decomposition dynamics,” Ecol., vol. 63, pp. 621-626, 1982.

[7] D. S. Schimel, “Terrestrial biochemical cycles-global estimates with remote sensing,” Remote Sensing of Environment, vol. 51, no. 1, pp. 49-56, 1995.

[8] C. A. Wessman, A. J. D., D. L. Peterson, and J. M. Melillo, “Remote sensing of canopy chemistry and Nitrogen cycling in temperate forest ecosystems,” Nature, vol. 335, pp. 154-156, 1988.

[9] P. J. Curran, “Remote sensing of foliar chemistry,” Remote Sensing of Environment, vol. 30, pp. 271-278, 1989.

[10] P. J. Curran and J. A. Kupiec, “Imaging spectrometry: A new tool for ecology,” in Advances in Environmental Remote Sensing (F. M. Danson and S. E. Plummer, eds.), pp. 71-88, Wiley and Sons, 1995.

[11] D. L. Peterson and S. Hubbard, “Scientific issues and potential remote-sensing requirements for plant biochemical content,” Journal of Imaging Science and Technology, vol. 36, pp. 446-456, September/October 1992.

[12] P. C. Williams and K. H. Norris, “Near-infrared technology in the agricultural and food industries.” American Association of Cereal Chemists, Inc., St. Paul, MN, 1987.

[13] G. C. Marten, J. S. Shenk, and F. E. Barton, “Near infrared reflectance spectroscopy (NIRS): Analysis of forage quality,” U.S. Dept. Agric. Handbook, vol. 643, pp. 1-96, 1989.

[14] D. H. Card, D. L. Peterson, P. A. Matson, and J. D. Aber, “Prediction of leaf chemistry by the use of visible and near infrared reflectance spectroscopy,” Remote Sensing of the Environment, vol. 26, pp. 123-147, 1988.

[15] D. L. Peterson, J. D. Aber, P. A. Matson, D. H. Card, N. Swanberg, C. Wessman, and M. Spanner, “Remote sensing of forest canopy and leaf biochemical contents,” Remote Sensing of Environment, vol. 24, pp. 85-108, 1988.

[16] T. McLellan, J. D. Aber, M. E. Martin, J. E. Melillo, K. J. Nadelhoffer, and B. Dewey, ”Comparison of wet chemistry and near infrared reflectance measurements of carbon fraction chemistry and nitrogen concentration of forest foliage,” Can. J. For. Res., vol. 21, pp. 1689-1693, 1991.

[17] P. J. Curran, J. L. Dungan, B. A. Macler, S. E. Plummer, and D. L. Peterson, “Reflectance spectroscopy of fresh whole leaves for the estimation of chemical concentration,” Remote Sensing of Environment, vol. 39, pp. 153-166, February 1992.

[18] L. F. Johnson and C. R. Billow, “Spectroscopic estimation of total Nitrogen concentration in douglas fir foliage,” Int. J. Remote Sens, vol. in press, 1995.

[19] B. J. Yoder and R. E. Pettigrew-Crosby, “Predicting nitrogen and chlorophyll from reflectance spectra (400-2500 nm) at leaf and canopy scales,”Remote Sensing of Environment, vol. 53, pp. 199-211, 1995.

[20] D. E. Sabol, J. B. Adams, and M. O. Smith, “Quantitative subpixel spectral detection of targets in multispectral images,” Journal of Geophysical Research, vol. 97, no. E2, pp. 2659-2672, 1992.

[21] Y. L. Grossman, S. L. Ustin, S. Jacquemoud, and E. W. Sanderson, “Critique of stepwise multiple linear regression for extraction of leaf biochemistry information from leaf reflectance data.” Remote Sensing of Environment, vol. 56, no. 3, pp. 182-193, 1996.

[22] S. Jacquemoud, J. Verdebout, G. Schmuck, G. Andreoli, and B. Hosgood, “Investigation of leaf biochemistry by statistics.” submitted to Remote Sensing of the Environment, 1995.

[23] J. C. Price, “Information content of Iris spectra,” Journal of Geophysical Research, vol. 80, no. 15, pp. 1930-1936, 1975.

[24] S. L. Ustin, M. O. Smith, and J. B. Adams, “Remote sensing of ecological processes: A strategy for developing and testing ecological models using spectral mixture analysis,” in Scaling Physiological Processes: Leaf to Globe (J. R. Ehleringer and C. B. Field, eds.), (San Diego), pp. 339-357, Academic Press, 1993.

[25] S. L. Ustin, Q. J. Hart, G. Scheer, and L. Duan, “Estimating dry grass biomass residues using aviris image analysis,” in IGARSS 94: Proceedings International Geosciences Remote Sensing Symposium, vol. 2, pp. 1211-1212, 1994.

[26] D. A. Roberts, J. B. Adams, and M. O. Smith, “Predicted distribution of visible and near infrared radiant ux above and below a transmittant leaf,” Remote Sensing of Environment, vol. 34, pp. 1-17, 1990.

[27] D. E. Sabol, J. B. Adams, and M. O. Smith, “Predicting the spectral detectability of surface materials using spectral mixture analysis,” in Proceedings of the IEEE International Geoscience Remote Sensing Symposium 1990, vol. 2, pp. 967-970, 1990.

[28] J. A. Gamon, C. B. Field, D. A. Roberts, S. L. Ustin, and R. Valentini, “Functional patterns in an annual grassland during an aviris overflight,” Remote Sensing of Environment, vol. 44, no. 2, pp. 239-253, 1993.

[29] M. O. Smith, D. A. Roberts, J. Hill, W. Mehl, B. Hosgood, J. Venderbout, G. Schmuck, C. Koechler, and J. Adams, “A new approach to quantifying abundancies of materials in multispectral images,” in IGARSS 94: Proceedings International Geosciences Remote Sensing Symposium, vol. 4, pp. 2372-2374, 1994.

[30] J. D. Aber, K. L. Bolster, S. Newman, M. Soulia, and M. E. Martin, “Testing the utility of end-member analysis in the measurement of Carbon fraction and Nitrogen content of forest foliage.” Submitted manuscript, 1994.

[31] J. W. Boardman, “Geometric mixture analysis of imaging spectrometry data,” in IGARSS 94: Proceedings International Geosciences Remote Sensing Symposium, vol. 4, pp. 2369-2371, 1994.

[32] J. C. Harsanyi and C. I. Chang, “Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection approach,” IEEE Transactions on Geoscience and Remote Sensing, vol. 32, no. 4, pp. 779-785, 1994.

[33] K. L. Bolster, M. E. Martin, and J. D. Aber, “Determination of Carbon fraction and Nitrogen concentration in tree foliage by near infrared reflectance: a comparison of statistical methods,” Can. J. For. Res., vol. 26, pp. 590-600, 1996.

[34] J. E. Pinzon, S. L. Ustin, Q. L. Hart, S. Jacquemoud, and M. O. Smith, “Comparison of multivariate statistical techniques for estimating vegetation parameter,” in Spectral Analysis Workshop: The Use of Vegetation as an Indicator of Environmental Contamination, Reno, Nevada, Nov 9-10, 1994.

[35] J. E. Pinzon, S. L. Ustin, Q. L. Hart, S. Jacquemoud, and M. O. Smith, “Using foreground/background analysis to determine leaf and canopy chemistry,” in Proc. 5th. annual JPL Airborne Earth Science Workshop: AVIRIS Workshop (R. O. Green, ed.), Jan 23-27, 1995, vol. 95-1, pp. 129-132.

[36] T. McLellan, J. D. Martin, M. E. and Aber, J. E. Melillo, and K. J. Nadelhoffer, “Determination of nitrogen, lignin, and cellulose content of decomposing leaf material by near infrared reflectance spectroscopy,” Can. J. For. Res., vol. 21, pp. 1684-1688, 1991.

[37] B. Hosgood, S. Jacquemoud, G. Andreoli, J. Verdebout, G. Pedrini, and G. Schmuck, “Leaf optical properties experiment 93 (LOPEx93),” Tech. Rep. EUR-16095-EN, European Commission, Joint Research Centre, Institute for Remote Sensing Applications, Ispra, Italy, 1994.

[38] S. L. Ustin, G. Scheer, C. M. Castaneda, S. Jacquemoud, J. E. Pinzon, A. Palacios, D. A. Roberts, and R. O. Green, “Estimating canopy water content of chaparral shrubs using optical methods.” Remote Sensing of Environment, vol. 65, pp. 280-291, 1998.

[39] S. D. Newman, M. E. Soulia, and J. D. Aber, “Proximate carbon fraction and nitrogen analyses for the accelerated canopy chemistry program: Methods and quality control.” cited in ACCP94, unpublished manuscript, 1995.

[40] Y. L. Grossman, S. L. Ustin, and E. W. Sanderson, “Relationships between leaf chemistry and reflectance for plant species from Jasper Ridge Biological Preserve, California,” in IGARSS 94: Proceedings International Geosciences Remote Sensing Symposium, vol. 4, pp. 2357-2359, 1994.

[41] CSTARS, “Center for Spatial Technology and Remote Sensing.” Public www page, 1995. http://cstars.ucdavis.edu

[42] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, Maryland: John Hopkins University Press, 1989.

[43] E. R. Malinowski, “Determination of the number of factors and the experimental error in a data matrix,” Analytical Chemistry, vol. 49, pp. 612-617, April 1977.

[44] P. J. Curran, W. R. Windham, and H. L. Gholz, “Exploring the relationship between reflectance red edge and chlorophyll content in slash pine leaves,” Tree Physiology, vol. 15, pp. 203-206, March 1995. [45] C. A. Wessman, “Evaluation of canopy biochemistry,” in Remote Sensing of Biosphere Functioning (R. J. Hobbs and H. A. Mooney, eds.), (New York), pp. 135-156, Springer Verlag, 1990.

1998, Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis