ASPRS/ACSM (1994), copyright ASPRS/ACSM


EXPLORING THE SUITABILITY OF FUZZY SET THEORY IN IMAGE CLASSIFICATION: A COMPARATIVE STUDY APPLIED TO THE MAU FOREST AREA KENYA

Charles Gichana Manyara
Department of Geography
Michigan State University
East Lansing, MI 48824-1115
James K. Lein
Department of Geography
Ohio University
Athens, OH 45701-2979

ABSTRACT

Fuzzy Set Theory has recently received considerable attention from the remote sensing community. Recognizing that traditional classifiers based on rigid, discrete classes contribute to noticeable thematic inaccuracy, the notion that a pixel can enjoy partial membership in a given informational class is an attractive alternative to the two-value logic implicit in most classification procedures. The purpose of this paper is to compare the informational value of thematic maps produced via traditional unsupervised classification and a fuzzy classification technique utilizing a modified Fuzzy C-Means algorithm. This comparative analysis was conducted using a Landsat MSS image covering sections of the Mau Forest in Kenya. Although forest degradation is a growing problem worldwide, in Kenya the Mau has experienced significant deforestation since the early 1970's. Therefore, a methodology that can provide a better characterization of forest cover and describe a wider range of intermediate conditions of this land cover class will enable implementation of improved monitoring programs capable of detecting sable variations in surface reflectance properties which may be indicative of land degradation effects. Results of this comparative study revealed that the fuzzy set approach produced a more detailed and precise classification of forest cover, suggesting that fuzziness can effectively extend the usefulness of map products developed from remote sensing imagery.

INTRODUCTION

Image classification implies the imposition of nominal scale informational categories onto the spectral response patterns captured by a multi-spectral data set. The nature of the classification system selected to relate spectral properties of a pixel to a specific informational class, such as a land cover type, is a significant aspect of this problem, as is the methodology used to perform classification and the specific classifier employed to direct the assignment of pixels to classes. Questions concerning the spectral, radiometric, and /Spatial resolution of the sensor platform involved, the presence and generation of mixed pixels, and the computational efficiency of the algorithm selected to perform classification are factors that commonly frustrate the classification process, contributing error and uncertainty to the results. While these limitations are well known to the remote sensing community and have encouraged a wealth of research to improved these methodological and technical shortcomings, an often overlooked issue relates to the meaning and linguistic clarity of the nominal categories comprising the classification system and the inexactness introduced as a function of language.

When focus shifts to the system of classification in use and the meanings assigned to the various categories that make up its structure, the classification problem expands to include consideration of how well the informational categories fit not only the image but the physical and cultural context the image reflects. In this context, the classification problem simplifies to the question of how representative the linguistic variables forming the system are with respect to the nature of the land surface defined by the scene.

[End Page 384]


Recently Fuzzy Set Theory has be applied to a range of issues related to the classification of ' multispectral imagery (Fisher and Pathirana, 1990; Key and Barry, 1989; Pedcryz, 1990; Wang, . 1992; Robinson and Throngs, 1986). Fuzziness, as defined in these and other studies suggests that a given pixel, owing to its spectral reflectance properties, may be placed into more than one informational/spectral class. Thus, the dichotomy of pure versus mixed pixels must be relaxed to recognize the presence of ascending or descending degrees of purity in a given class. These levels of purity are of interest, since they may explain more than simply a mixed spectral response pattern, they may reflect variations in intensity within a given class that may be indicative of some underlying process acting on the feature.

The purpose of this study was to examine the informational value of thematic maps produced via traditional unsupervised classification and a classification procedure employing a modified Fuzzy C-Means clustering technique. At issue in this study was the question of what exactly is being shown on a land cover map, and might it be possible to represent more than "crisp" informational categories when the phenomena of interest may not necessarily conform well to crisp delineations. This comparative analysis was conducted using a Landsat MSS subset covering the northern sections of the Western and Eastern Mau Forests of Kenya. The goal of the research presented in this paper is to establish a baseline for classification that can be employed in monitoring global environmental change. Therefore, by comparing the treatment of forest classification, a method may be suggested that is capable of providing more useful information relative to the nature of this land cover type.

FUZZINESS AND FOREST CLASSIFICATION

The problem of forest degradation in Kenya is a growing concern. With a high population concentration on the highlands, a 3.5% annual population growth rate, and a predominantly agriculturally based economy, Kenya's forests are under heavy pressure from competing agricultural uses and fuelwood needs (Allaway and Cox, 1989). The importance of accurate forest cover monitoring lies in the fact that Kenya's forests are found on highland areas characterized by rugged topography and heavy rainfall. Soil erosion and downstream sedimentation are likely consequences if present rates of forest loss continue unchecked (Figure 1).

[End Page 385]


Forest cover, however, is a complex and imprecise linguistic variable. The concept of Forest carries with it botanic as well as cultural factors in its definition. In a remote sensing context forest lands have been defined according to Anderson et al. (1976) as areas describing a tree-crown areal density of 10 percent or more, are stocked with trees capable of producing timber or other wood products, and exert an influence on the climate or water regime. As this definition continues, Anderson et al. note that the boundary between forest and other categories of land cover may be difficult to delineate precisely. Based on the definition given above coupled with the difficulty associated with accurately delimiting forests boundaries suggests that the cover type forest explains a high degree of ambiguity and vagueness which influences its /Spatial characterization.

Classifying forest cover, therefore, requires a method capable of using imprecise concepts where a precise boundary between membership or non-membership in a class may be impossible or impractical. In these instances it may be beneficial to treat changing membership in a given informational class as gradual rather than abrupt. Here, as is the case with forest cover, classification based on "crisp" boundaries and membership defined by a comparatively simple two-value logic is insufficient when called upon to relate complex land cover features or dynamic superficial processes.

Employing the Theory of Fuzzy Sets introduced by Zadeh(1965), the class forest translates into a series of measures (x) such that:

  FOREST = { X, U[Forest](x)},

where U[Forest] defines the grade of membership of brightness value (x) in the class. Participation in the class Forest for any pixel X can range from 0 defining perfect non-membership to 1 describing perfect membership. The grading of X in the class forest is based on the concept of possibilities, where the value of U[Forest](x) is interpreted as the degree of compatibility of the predicate associated with set Forest and pixel X. A fuzzy set, therefore, permits partial membership of its elements in more than one set (class).

IMAGE CLUSTERING AND FUZZY SETS

In a forest environment very different conditions can exit at the surface and still define the same nominal category. It is possible, for example, for a given pixel to be more forest than another simply because a greater percentage of trees fall inside it. Yet the mixed pixel or pixel with a smaller percentage of occupying trees still defines some characteristic of forest cover. Distinguishing these sable variations are essential to the representative measurement and monitoring of forest change, but prove frustrating when spectral classes must be assigned a rigid informational designation.

Image clustering can be defined as the identification of natural groups within a multispectral data set. The algorithm that performs clustering functions to partition a set of objects (pixels) into relatively homogenous subsets based on inter-object similarities with little or no overlap (Kachigan, 1982). In general, clustering methods can be categorized by principle (objective function, graph theoretical, hierarchical) or by model type (deterministic, statistical, heuristic, fuzzy) (Hathaway et al., 1988; Cannon et al., 1986). Regardless of algorithm, no clustering criterion or measure of similarity is universally superior in application. This leaves the choice of algorithm a partially subjective decision and always open to question.

In this study two approaches to the clustering problems were used; a traditional cluster analysis using a histogram peak method to derive the initial cluster centers (Eastman, 1992), and the method

[End Page 386]


method derives fuzzy clusters describing an objects level of participation in the various natural groups identified. Since different clustering algorithms will produce different results on a given data set, the contrast between the hard cluster solution and the soft or fuzzy clusters can be expected to produce very different realizations of the nature of forest cover. This study seeks to explore which realization provides more useful and meaningful information.

DATA ACQUISITION AND METHODOLOGY

A January 1973 Landsat MSS image detailing the Northern Mau forest region was subset, geometrically corrected and registered to a UTM grid. A 1973 image was selected since it could provide a reference baseline against which changes in forest cover could be measured. From this image a series of 4 smaller study sites depicting different forest environments within the Mau were extracted for analysis. Analysis began in two phases. Phase one required conducting an unsupervised classification using the IDRISI Cluster module (Eastman, 1992). Clustering was performed using a broad generalization level, dropping the least significant clusters (< 1% total area). A three cluster solution was obtained for each study site which were then reclassed to expose only forest and non-forest categories. Phase two involved passing each study site to the Fuzzy C-Means algorithm (FCM). To implement the FCM program additional parameters were required to guide the partitioning process. These parameters included selection of a distance measure and a weighting exponent (Table 1). The FCM algorithm was run for each study site, producing a predetermined ten cluster solution. Determining the number of valid clusters in each data set was accomplished by comparing the partition coefficient (F) against the partition entropy (H) (Table 2). For every weighting exponent tried F maximized (and H minimized) at three cluster solution. This study adopted a three-cluster solution for further analysis.

              NWSUB SCENE DATA
              (3F6.2)
              1 2.000 3 2102500
               21.00 12.00 22.00
               19.00 12.00 18.00
               19.00 12.00 18.00
               19.00 12.00 21.00
               19.00 13.00 21.00

  Table 1. FCM Data File - first five lines

[End Page 387]


With the selection of the optimal clusters completed, it was then necessary to determine which of the three clusters defined forest cover. Using ancillary data in the form of aerial photographs for one of the study sites, its was determined that cluster two [2] represented the forest class. The membership values for pixels in Cluster 2 were converted into IDRISI image files for presentation and comparative analysis.

Cross-classification comparisons were performed using simple cross-tabulations of the "hard" versus "fuzzy" solutions for each study site. Cross-tabulation permitted pixel by pixel comparisons of class assignments and the overall level of agreement or association between the two classification methods, as well as the degree of participation of a pixel in the class Forest (Table 3).

DISCUSSION OF RESULTS

Figures 2 and 3 show the /Spatial expression of forest cover using the "hard" and "fuzzy" classification approaches. Forest clusters were easily identified by their low values in bands 1 and 2 and correspondingly high values in band 4 as well as by the location of pixels with high memberships.

While the lower and upper percentages forest pixels may in general be properly classified by the traditional method of cluster analysis, there is considerable generalization in pixels identifies as forest or non-forest when percent of forest covered for each scene is examined. For instance, the inclusion of the zero percent forest pixels in the hard cluster solution represents a serious over generalization. This confusion is further extended when mixed pixels of intermediate percent membership are considered.

The interpretation of mixed pixels and their assignment to a unique cover class can be attributed in part to the method of pixel allocation. The nearest neighbor method, allocating pixels to classes based on diagonal distance from the most frequent digital number, often defines pixels that do not stand out as a unique spectral response pattern. Pixels representing GRC in situations such as charcoal burning sites, or exposed rock surfaces which are common in the study area, should be described as partly forest. The inclusion/exclusion of such pixels in land use/land cover classes is a major source of error adversely degrading the quality of forest inventories. As human activities continue to stress the Mau forests, precise information of this type is vital.

The results of the FCM approach proved the obvious, that pixels representing GRC often belong to more that one land cover class. The condition of mixed pixels has profound importance in this study where of the total 2,500 pixels per site, only 249 were pure in Site I, 219 in Site II, 29 in Site III, and 87 in Site IV. The remaining pixels defined forest cover in varying proportions or in various states. While in some instances a mixed pixel could not be assigned to a class, the results generally showed that the complexity of forest cover is such that "hard" classifications contribute to an information loss that effectively misrepresents the dynamics of the environment in which that cover\forest is found. With respect to the Mau forests and potentially similar tropical environments,

[End Page 388]


the main achievement of this study rests in the identification of pixels whose membership can be assigned to classes intermediate of pure forest yet whose sub-pixel structure represents an aspect of forest cover that would otherwise be omitted from classification.

[End Page 389]


SUMMARY AND CONCLUSION

This study explored the applicability of fuzzy set theory approach for land use/land cover classification of a forested environment from satellite data representing a shift from conventional classical set theory. The latter has often led to inaccurate and imprecise data analysis especially where sensed surfaces are heterogenous in nature. Priority was placed in the interpretation of fuzziness in satellite data and the application of fuzzy logic rules in digital image analysis. The results demonstrated that the fuzziness is measurable and meaningful.

The study explored the fuzzy membership value of a pixel in a continuum and simultaneous occurrence of land cover type while maintaining accuracy. The accuracy was maintained at two levels; the parametric checks within the algorithm and by the algorithm's capability to do a pixel-by pixel analysis of the imagery. Though in broad terms accuracy is a product of a number of assumptions including internal homogeneity, short range variation and the general environmental complexity, the nature of /Spatial feature distribution remain intractable consistent, as proven by the assessment of the performance of the fuzzy classification rules.

Although fuzzy rules are of limited interest in problems where true membership is hard, they provide an alternative in addressing the problem of internal purity of mixed pixel, a weakness that has long been assumed in the formulation of classes. Most of the common conventional techniques of classifying have long assumed that there is no change within classes up to the boundary when actually there may be greater change within than at the boundary. This is true especially in the wet tropical environment where the bio-diversity is so intense that what is referred to as a forest have in actual sense less tree biomass relative to other cover type while at the same time surfaces that may have few trees may be generalized as 'tree-less'. These are the cases of misclassification observed when the two algorithms were compared. The /Spatial variation of phenomena has not been given special regard in this respect. The work reported here emphasize that the boolean logic present information categories in a generalized and simplified format which is inadequate for realistic applications which include inexact information such as found in class mixture and intermediate conditions as they occur in their natural setting captured in satellite images. It has proved that the representation of a forest environment as that of forest blocks with distinct boundary margins beyond which no forest may exist is unreal. The fuzzy cluster of the forest environment has shown that the reality of a forest or plant association is that there is a gradual gradation in vegetation occurrence in terms of size, number, age, stress type etc, over space.

It may be observed that the true picture in an area like Mau where beside the forest trees, there are also trees on the farms, while the forest intensity also vary with altitude, and human interference even within the forest interiors. The implication of pixels' full membership without regard to relative strength of other class membership has been discarded so as to address adequately the cases of crisp boundaries and the naturally fuzzy boundaries of such plant communities.

REFERENCES

Allaway, J. and Cox, P.M.J. 1989. "Forests and Competing Land Uses in Kenya". Journal of Environmental Management, 13, 2, p.171-187.

Bezdek, J.C, Ehrlich R., Full W., 1984. "FCM: The Fuzzy c-Means Clustering Algorithm". Computers and Geoscience. 10,2-3, p.191-203.

Cannon, R.L., Dave, J.V., and Bezdek, J.C. 1986. 'Efficient Implementation of the Fuzzy c -means Clustering Algorithms." IEEE Transactions on Pattern Analysis and Machine Intelligence. PAM-8, 2, p.248-255.

Eastman, J.R. 1992. IDRISI: A Grid-Based Geographic Analysis System. Version 4.0. Worcester, Massachusetts: Clark University, Graduate School of Geography.

[End Page 390]


Fisher, P.F., Pathirana, S., 1990. "The Evaluation of Fuzzy Membership of Land cover Classes in the Suburban zone." Remote Sensing of the Environment. 34, p.121-132.

Hathaway, R.J., Kim,T., and Bezdek, J.C. 1988. "Optimality Tests for Fixed points of the Fuzzy c-Means Algorithm." Pattern Recognition, 21, 6, p.651-663.

Kachigan, S.K. 1982. Multivariate Statistical Analysis. New York: Radius Press.

Key, J.R. and Barry, R.G. 1989. "Cloud Classification from Satellite data using a Fuzzy Sets Algorithm: A Polar example." International Journal of Remote Sensing, 10, 12, p.1823-1842.

Pedcryz, W. 1990. "Fuzzy Sets in Pattern Recognition: Methodology and Methods." Pattern Recognition, 23, 1/2, p.121-146.

Robinson, V.B, Thongs D. "Fuzzy Set Theory Applied to the Mixed Pixel problem of Multi-spectral Land Cover Databases." GIS in Government, 2, 1986. Edited by Opitz, B.K. Washington D.C: A.Deepak Publication.

Wang Fangju. 1992. "Improving Remote Sensing Image Analysis Through Fuzzy Information Representation." Photographic Engineering and Remote Sensing, 56, 8, p.1163-1169.

Zadeh, L.A. 1965. "Fuzzy Sets." Information and Content. 8, pp.338-353.

[End Page 391]