ASPRS/ACSM (1994), copyright ASPRS/ACSM
Development of advanced techniques for improving remote sensing image classification accuracy is essential for deriving reliable land cover information for both cultural and natural resource applications. A particularly appealing and promising technical approach for enhanced classification is the integration of multispectral, multitemporal satellite remote sensing data and multisource ancillary data. An advantage of using multisource /Spatial data is that additional features and /Spatial attributes can be incorporated in the classification. Ideally, each of the data sources will have unique information contributing to the classification process. A substantial difficulty for comprehensive analysis of multisource /Spatial data, however, arises from the conflicts and incompatibilities among the differences in measurement scales and feature distributions from the various data types. Therefore, a distribution-free and measurement scale-free classification technique is desirable for processing multisource /Spatial data. Artificial neural networks are among the optimal tools for this type of application. Artificial neural network techniques were developed for processing two dates of Landsat TM data, two channels of illumination data, and a measure of image texture for the purpose of deriving more accurate land use and land cover information. Classification results derived from the neural network approach overall were nearly ten percent more accurate than those derived previously using a conventional maximum likelihood approach.
In human visual image interpretation, the criteria used for classification can be broadly defined by the tone or color, size, shape, shadow, pattern, texture, and /Spatial relationships of the ground targets. An interpreter's knowledge, experience, and familiarity with a study area also contribute to the classification process. The powerful capabilities for knowledge acquisition, recall, synthesis, and problem solving of the human brain have inspired scientists from different disciplines to attempt to model its operations. Based on the biological theory of human brain, artificial neural networks are models that attempt to parallel and simulate the functionality and decision-making processes of the human brain. In general, a neural network is referred to as a mathematical model of theorized mind and brain activity (Simpson, 1990). Neural network features corresponding to the synapses, neurons, and axons of the brain are input weights, processing elements, and output paths. In an artificial neural network, the processing element (PE) is the analog to the human brain's biological neuron. A processing element has many input paths, analogous to the brain's dendrites, and the information transferred along these paths is
[End Page 123]
combined by one of a variety of mathematical functions, most commonly simple summation. The result of these combined inputs is some level of internal activity (I) for the receiving processing element. The combined input contained within the processing element is modified by a transfer function (f) before being passed to other connected processing elements, whose input paths are usually weighted ( W[ij] ) by the perceived synaptic strength of neural connections. A transfer function is required to avoid saturation of a processing element, caused by extremely large positive or negative internal summations. Commonly, either a sigmoid or hyperbolic tangent function is applied. Both are monotonic (smooth) transformations of a processing element's internal value. In the case of the sigmoid function, the output range is {0,+l}. The hyperbolic tangent is a bipolar version of the sigmoid and produces scaled output over { - 1,+ 1 }.
A neural network consists of organized topological interconnections among the PEs, learning rules, and knowledge recall. The topological structure establishes the frame of the network, the learning paradigm trains the network by presenting example input data pattern and the corresponding desired output, and the recall applies the pattern recognition knowledge learned in the training step to process and in this case classify the raw data. The most popular forms of neural networks typically consist of three or more layers--an input layer, an output layer, and one or more hidden layers. The input layer consists of one or more processing elements which present the training data, and the output layer consists of one or more processing elements which store the results of the network. In the case of remote sensing data classification, the inputs often represent the vector of brightness values for the multispectral data. Hence, for single-date Landsat data, there would be seven input nodes, each corresponding to a band of the Thematic Mapper sensor. The input patterns could consist also of ancillary data (e.g., multitemporal spectral patterns, image texture, elevation and its derivatives, etc.). Classifying multisource remote sensing and /Spatial data requires the ability to match large volumes of input pattern data simultaneously to generate categorical information as output. Since the learning and recall depend on the linear and nonlinear combination of data patterns instead of the statistical parameters of the input data, neural networks offer the opportunity to analyze /Spatial data with different origins and properties simultaneously, without a priori assumptions about the distribution for each data type. In fact, neural networks have the ability to learn those distributions, if they exist, in the input data. Therefore, a neural network can be trained by data from different sources. The one, two, or perhaps more hidden layers consist of a number of processing elements which enable the translation of input data into output information, which, in the present context, is the land cover classification corresponding to an input pattern. Ideally, each data type will make a unique contribution to the discrimination of land cover class patterns, therefore, enabling the neural network to learn the spectral, /Spatial, and temporal signature of each class.
Benediktsson et al. (1990) compared neural network and statistical approaches to multispectral data classification. They noted that conventional multivariate classification methods cannot be used in processing multisource /Spatial data because of their often different distribution properties and measurement scales. Heermann and Khazenie (1992) compared neural network techniques with more classical statistical programming techniques. Neural networks generally use a learning method to program the network to understand and solve problems by example. Heermann and Khazenie's study emphasized the analysis of larger data sets with a back-propagation technique, in which error is
[End Page 124]
distributed throughout the network. They concluded that the back-propagation network could be easily modified to accommodate more features or to include /Spatial and temporal information. Bischof et al. (1992) included textural information in the neural network process and concluded that neural networks were able to integrate other sources of knowledge and use them in classification. Hepner et al. (1990) compared the use of neural network back propagation classification approach with a conventional supervised maximum likelihood classification procedure using a minimum training set. The research results suggested that a single training site per-class neural network classification was comparable to a four training site per-class conventional classification. This result demonstrated that the neural network technique offered a potentially more robust approach to land cover classification than that of using conventional image classification techniques.
Visual interpretation of remote sensing imagery involves the use of both spectral and /Spatial associations. Ritter and Hepner (1990) attempted to exploit both of these properties using a neural network approach. A back-propagation learning algorithm was used to train a neural network to recognize four land cover types: urban, water, forest, and grassland. A feed-forward mode was applied for the classification. The results showed that the neural network had the ability to distinguish small linear areas, which were apparent on the original TM image, but for which no ground truth had been provided as training data.
A common aspect of using artificial neural networks instead of using statistical methods in the research cited here is a recognition that neural network techniques are distribution-free and scale-free. Therefore, /Spatial data from different sources can be jointly analyzed. While conventional classifiers rely primarily on spectral characteristics of remote sensing data, neural networks are more amenable to using more of the /Spatial information contained in the images. Efficient techniques for combining and analyzing /Spatial data from different data sources with a neural network, however, still remain to be developed fully. Previous research also acknowledged that some of the theoretical and technical problems in the application of neural networks in remote sensing and GIS remain. It has been suggested that artificial neural network applications using data from diverse sources with different distributions, such as multispectral, multitemporal remote sensing data, and image segmentation from internal /Spatial structure or GIS underlying data layers, should be explored (Civco, 1993). To contribute further to the development of such artificial neural network approaches, the research described in this paper addressed three principal objectives:
1. To develop neural network models to handle multispectral, multitemporal remotely sensed data and /Spatial data from other sources
2. To build efficient topological neural network interconnections, establish learning rules, train, evaluate and refine the network
3. To apply the trained neural network to multisource remote sensing and derived /Spatial data to produce land use and land cover information
[End Page 125]
Two adjacent USGS 7 1/2 minute quadrangles in Connecticut -- Ellington and Broad Brook -- were selected as the study areas. The former was used in technique development and the latter for independent testing. These two quadrangles were chosen because of their diversity in land use and land cover, the availability of multiple Landsat TM scenes, and their use in previous image analysis and classification studies (Civco, 1991, 1993, Wang and Civco, 1992a, 1992b).
All seven bands of Landsat TM data for each of two dates (May 4, 1988, and August 30, 1990) served as the multitemporal, multispectral data sets. Multisource /Spatial data included USGS 30-meter digital elevation model (DEM) data, a first-order statistical measure of image texture, and illumination model data for each TM scene. The multitemporal remote sensing data were used for providing the spectral responses of land covers in different seasons; categories which possess similar spectral properties in one season, but with different ones in the other, thereby could be more easily separated. Image texture was described by local variance in brightness levels in a 3 by 3 pixel neighborhood. This low-level measure is statistical rather than structural and was intended to be an initial indicator of the textural properties of different land use and land cover types. Additional research is being directed toward the development and integration of higher order structural models into classification processes. The illumination models portray the solar-terrain geometry (cosine angle) at the time of each TM scene acquisition. The illumination model data were envisioned to account for topographically-induced spectral variation and thereby enable spectral variants of individual classes to be recognized.
The ERDAS(tm) - IP/GIS system was used for preparing the multispectral, multitemporal, multisource /Spatial data and selecting the training and testing data for the neural network development, evaluation, and refinement. The NetBuilder(tm) and NeuralWorks Professional II Plus(tm) software shells were used for building the topologically structured neural network. NetBuilder was used primarily for discovering appropriate input data channel combinations and for revealing the proper neural network topological structure. The more flexible and comprehensive NeuralWorks Professional II Plus was used to build the final neural network and to conduct the final classification.
In this research, a three-layer back-propagation neural network with a sigmoid transfer function among PEs was developed. The back-propagation procedure minimizes the global error of the entire network, assuming that all PEs and the interconnections are potentially responsible for the classification errors generated, Error is propagated backward through the interconnections to the previous (hidden) layer and connection weights are adjusted accordingly. The information transfer among processing elements (neurons) is guided by:
{Insert Equation]
where: [exp] is the current output state of the jth neuron (PE) in layer s; [exp] is the connection weight between the ith neuron in layer (s-l) and the jth neuron in layer s; [exp]
[End Page 126]
is the weighted summation of input to jth neuron in layer s; and f is the sigmoid function bounded by {0, +l} and defined as:
A model PE in the back-propagation network is shown in Figure 1.
When given an input vector and the corresponding desired output, the input is propagated forward through the network to compute the output vector. The output vector is compared with the desired output, and the error is determined. The errors are then propagated back through the network from the output to input layer. The multiplication of the error by the derivative of the transfer function scales the error (NeuralWare, 1991). The result is that learning takes place by way of the overall error being minimized.
In this research, the input layer contained 17 PEs, corresponding to 17 channels data (14 from the two dates Landsat TM data, one as a first-order image texture measure derived from Band 4 of the May Landsat TM image, and the final two from the May and August illumination models.). The output layer contained 15 PEs corresponding to the 15 pre-defined land cover categories in the classification. In structuring the architecture of a neural network, one crucial and difficult to determine parameter is the number of PEs in the hidden layer(s) (Bischof et al., 1992). The hidden layer is responsible for internal representation of the data and the information transformation between input and output layers (i.e., the learning). If there are too few PEs in the hidden layer, the network may not contain sufficient degrees of freedom to form a representation (i.e., insufficient learning capacity). If too many PEs are defined, the network might become over-trained (i.e., classify training patterns well but lack the ability to generalize to other independent data) (Heermann, et al., 1992). Therefore, a balanced design for the number of PEs in the hidden layer is important. The NetBuilder software shell back-propagation algorithm can automatically structure the hidden layer and prune unnecessary processing elements. This is accomplished by an initial over-allocation of PEs to the hidden layer and a subsequent examination and elimination of irrelevant elements (i.e., ones with little weighting). This approach was used in this research to elucidate the structure of the hidden layer by submitting all of the 17 channels of input data and the corresponding output class to the network and allowing it to self-organize. After pruning, twenty PEs remained in the hidden layer for transforming information from the input data patterns into the output land cover category types. This topological architecture of 17 input, 20 hidden, and 15 output neurons was recreated in the Neural Works Professional II Plus software shell and is illustrated in Figure 2.
Based on previous experience with land cover mapping in the two study area quadrangles, 15 land cover categories were defined for neural network classification. Table 1, lists the classes, which include some generally occurring types, such as water, deciduous and coniferous forest, non-forested and forested wetlands, as well as classes recognizable because of their unique multitemporal reflectance properties, including tobacco, soil/corn, or nurseries.
[End Page 127]
[End Page 128]
Data for training the neural network and testing the trained network were acquired through interactive pixel sampling of the Landsat TM May and August data. All sources of input data were precisely registered so that the attributes from different data layers in the sample positions could be extracted. To avoid /Spatial autocorrelation and neighboring pixel influences, each of the sample pixels was selected individually. In total, 1780 sample pixels were selected to represent the 15 land cover categories. The selected sample pixels were evenly and randomly divided into two data sets -- one set for training and the other for testing the trained neural network.
The NetBuilder software shell was used to test different data combinations and for prototyping the architecture to be used in the final network configuration. The neural network classification accuracies achieved with the training and the test data sets are presented in Table 2. The results suggest that improved classification generally can be achieved by incorporating additional features into the decision making process. This was found to be true especially for the multitemporal Landsat TM training data over either of the two independent dates alone. The use of the illumination models seemed generally to account for the spectral variations within classes introduced by topographic slope and aspect, but the low-level index of texture seemed to contribute little if any knowledge beneficial to the classification process.
After the network was trained with the Neural Works Professional II Plus neural shell, all of the 17 channels of the multispectral, multitemporal, and multisource /Spatial data for the entire quadrangle(s) were presented to the network. The knowledge acquired in the training stage was recalled to calculate the weights for the output categories for each pixel. When the multichannel input data were processed, each of the PEs in the output layer received a calculated weight. Based on a simple and rather intuitive decision rule, the category corresponding to the processing element which received the greatest weight
[End Page 129]
during the network classification process was assigned as the pixel's land cover category label. Typically, the processing element (land cover category) which possesses significantly distinct spectral features in the input Landsat TM data and is highly correlated with the /Spatial features in the other input data channels will receive the greatest weight. However, with some pixels, because of the similarity in spectral or /Spatial features between two or among three or more categories, different output neurons (i.e., different classes) might receive similar weights. For example, for one pixel, the deciduous forest category (output neuron) received a weight of 0.568, and the pasture category received a weight of 0.567. Based on the decision rule used, the deciduous forest category, corresponding to the higher weight of 0.568, will be assigned to that pixel. However, in practice, it is difficult to judge the real (significant) difference between values that are so close. Only slightly dissimilar values may, in fact, depict a mixed pixel. In this project, those weights close to the maximum weight were defined as fuzzy weights, and the corresponding categories were defined as fuzzy categories. The distance between the output weights and the maximum weight, which determines whether the weights should be identified as proximal, was defined as the fuzzy tolerance. The following rules were employed to handle this fuzzy situation:
With this protocol and using a 0.005 fuzzy tolerance, 2343 pixels among the total 170280 (approximately 1% of a quadrangle) were identified, re-processed, and classified.
[End Page 130]
Standard accuracy assessment was performed to evaluate the classification results. Five hundred pixels were randomly selected from the test quadrangle as the reference pixels. The ground truth of those five hundred pixels was derived from large scale aerial photographs, 1:24,000 topographic maps, 1:24,000 wetland maps, field observations and the investigators' familiarity of the local land cover types. The neural network classifications and the corresponding ground truth were compared pixel-by-pixel.
The omission-comission matrix and classification accuracies are listed in Table 3. An overall accuracy of approximately 87.6% was achieved with the neural network classifier, whereas only approximately 80% was obtained with previous classifications of the same area using single date TM data and a traditional statistically-based technique (Wang and Civco, 1992a, 1992b).
An advantage of using artificial neural networks is that they possess the ability to learn the internal information pattern among the multisource data and can recall the knowledge
[End Page 131]
acquired in the learning stage to conduct the classification. The input patterns will be processed by the trained network and output neurons will be weighted according to the knowledge contained within the network. Classification can be performed by using a simple, intuitive approach of selecting the class corresponding to the output neuron with the greatest value. If however, there is no clear winner (i.e., there is no dominant message to support any single category, or the message for several categories is unclear), then either mixed classes can be identified or /Spatial post-processing can be applied to resolve output processing elements with similar values. The latter approach has been applied in this project, but that does not preclude the possibility that some of those pixels labelled as a specific category might in fact be mixed pixels.
Potential questions about the use of artificial neural networks have arisen, however. In this research, the neural network was treated as the black-box tool. All of the multisource /Spatial data was submitted to the neural network for processing and decision-making. Although an improved classification was achieved, the contribution of each individual data source and the information about intermediate states, however, is neither observed nor fully discovered. While more information can be introduced by multiple data sources, noise, redundancy and confusion may also be introduced. Data redundancy may influence the network training and the final classification. If a neural network can be modularized -that is, if it can be structured into local experts -- then the information process of each data source can be decomposed, and the contribution of each data set can be separately evaluated. Therefore, the advantages of each data source can be discovered and efficiently employed to improve the classification performance. In ongoing research we are developing such a modularized artificial neural network to accommodate multitemporal satellite multispectral data and multisource /Spatial data.
It is recognized that, compared with conventional statistically-based classification, the application of artificial neural networks to remote sensing data analysis is still in its relative infancy. It can be concluded, however, from the research reported here and elsewhere, that neural networks have abilities to handle multispectral, multitemporal, multisource /Spatial data more efficiently than parametric statistical methods without prior considerations or assumptions about the different statistical distribution patterns and the measurement scales of those data. Neural network models tested which incorporated multidate Landsat TM ancillary /Spatial information had accuracies of approximately 9-12 percent greater than single date neural networks. Compared with the conventional statistically-based classification techniques, the artificial neural network technique implemented here was approximately 8 percent more accurate than a traditional, single date maximum likelihood approach.
The research upon which this paper is based was supported in part by the Storrs Agricultural Experiment Station (SAES) under the project Improved land cover mapping through innovative computer-assisted processing of satellite digital remote sensing data--Phase II. This paper has been submitted as SAES Scientific Contribution No. I518.
[End Page 132]
1. Benediktsson, J., P. Swain, and O. Esroy. 1990. Neural network approaches versus statistical methods in classification of multisource remote sensing data. IEEE Trans. on Geoscience and Remote Sensing, 28:540-552.
2. Bischof, H., W. Schneider, and A. Pinz. 1992. Multispectral classification of Landsat images using neural networks. IEEE Trans. on Geoscience and Remote Sensing, 30: 482-490.
3. Civco, D.L. 1991. Landsat TM land use and land cover mapping using an artificial neural network. Proceedings of the 1991 Annual Meeting of the American Society for Photogrammetry and Remote Sensing, Baltimore, MD, Vol. 3, pp. 67-77.
4. Civco, D.L. 1993. Artificial neural networks for land-cover classification and mapping, Int. J Geographical Information Systems, 7(2):173-186.
5. Heermann, P. and N. Khazenie. 1992. Classification of multispectral remote sensing data using a back-propagation neural network. IEEE Trans. on Geoscience and Remote Sensing, 30:81-88.
6. Hepner, G., T. Logan, N. Rittner and N. Bryant. 1990. Artificial neural network classification using a minimum training set: comparison to conventional supervised classification. Photogrammetric Engineering and Remote Sensing, 56(4):469-473.
7. Hopfield, J. 1982. Neural networks and physical systems with emergent collective computational abilities, Proceedings of the National Academy of Sciences. pp. 2554-2558.
8. NeuralWare, Inc. 1991. Neural Computing, NeuralWorks Professional II/Plus and NeuralWorks Explorer, NeuralWare Inc., Pittsburgh, PA.
9. Ritter, R. and G. Hepner. 1990. Application of an artificial neural network to land-cover classification of thematic mapper imagery. Computers and Geosciences, 16:873-880.
10.Simpson, P.K. 1990. Artificial Neural Systems: Foundations, Paradigms, Applications, and Implementations, Pergamon Press Inc., New York. pp.3-14.
11.Wang, Y. and D.L. Civco. 1992a. Post-classification of misclassified pixels by evidential reasoning: a GIS approach for improving classification accuracy of remote sensing data. in Proc. of ASPRS/ACSM/RT'92 Convention, Washington D.C., pp. 160-170.
12.Wang, Y. and D.L. Civco. 1992b. /Spatial modeling-based post-classification of satellite remote sensing data for improved land cover mapping. in Proc. of ASPRS/ACSM/RT'92 Convention, Washington D.C., pp. 122-132.
[End Page 133]