EGIS (1994), copyright EGIS Foundation.


MODELLING AND VISUALIZING UNCERTAINTIES IN MULTI-DATA-BASED SPATIAL ANALYSIS

Shi Wenzhong*, M. Ehlers** and K. Tempfli*
*International Institute For Aerospace Survey and Earth Sciences (ITC)
P.B.242, P.O.Box 6, 7500 AA Enschede, The Netherlands
Fax: +31 53 874335; E-mail: shi@itc.nl, tempfli@itc.nl
**University of Osnabrueck P.O. Box 1553, D-49364 Vechta, F.R. Germany Fax: +49 4441 15445; E-mail: ehlers@dosuni 1.bitnet

ABSTRACT

This paper describes three aspects of uncertainties in multi-data-based spatial analysis. First, positional uncertainty of an area object in a GIS, is discussed as positional uncertainty of a line segment and a boundary line feature. Second, thematic uncertainty of a classified remote sensing image is described by the maximum probability value of each pixel. Third, uncertainties after combining GIS and remote sensing data are described by the "S-band" model. Visualization of uncertainties for both source data and spatial analysis results is also addressed. We introduce 3D and colour techniques to visualize the spatial distribution of uncertainties.

INTRODUCTION

In spatial data analysis, we may have to deal with data of various types and origins, such as vector GIS data, raster remote sensing image data, statistical tables, etc. They can also be classified as positional data, thematic data and temporal data in either continuous or discrete form. In combining various data types for spatial analysis, we in fact also combine the uncertainties of the various data sources. The uncertainty of the result of spatial analysis is caused by the uncertainties of the multi-source data and of the spatial operations applied to them. Quantifying and visualizing uncertainty is expected to contribute to better decision making.

MODELLING UNCERTAINTIES

General Consideration

Uncertainties associated with spatial data may include positional uncertainty, thematic uncertainty, logical uncertainty and temporal uncertainty. The effects of these uncertainties can be formulated as:

[End Page 454]


Since the components of (1) can vary from one case to another, we cannot define a general function F which applies to all cases. Concrete forms for F can be defined only when considering specific practical problems, which is done in the next section.

A Practical Problem

Suppose, for instance, we want to inventory land cover of a certain area, e.g. a county. The boundary of the county has been digitized from a map and is available in a GIS. The land cover is obtained from a classified remote sensing image using maximum likelihood classification. The question is: What is the size of the area of every class in this county? We want to know not only the areas in numbers (e.g. acres or hectares) but also the uncertainties of these data. The procedure for a land resource inventory using remote sensing and GIS technologies is illustrated in Figure 1.

In this example, two types of spatial data are involved: GIS data and classified remotely sensed data. We will assume that the GIS data have only positional uncertainty and the classified remote sensing data have only thematic uncertainty. This thematic uncertainty originates from the classification process. Here we will not discuss possible thematic uncertainty of the GIS data nor include positional uncertainty of remote sensing data when registered to GIS data layer. Geometric procedures for the rectification and registration of remote sensing imagery are well developed and will not be addressed yet another time here (see, for example, Ehlers 1994). The "S-band" model (Shi and Ehlers, 1993) was developed to combine positional and thematic uncertainties and will be applied to the overlay operation shown in Figure 1.

Problem Formulation

Typically, a remote sensing image is classified using a maximum likelihood classifier which is based on Bayesian statistics (Richards, 1986). For a given pixel Z[T], the classifier computes

[End Page 455]


MODELLING POSITIONAL UNCERTAINTY OF AREA OBJECTS

The basic geometric element of a vector GIS is a "point". Two connected points compose a line segment. Line features are composed of line segments. An area object is determined by its boundary line feature. Thus, we have the procedure to build an area object from (boundary) points to (boundary) line segments and boundary line features (polygon) By introducing the confidence region of a line segment and the probability distribution of a line segment (Shi and Tempfli, 1994), we can quantify the uncertainty of an area object caused by random errors of the boundary vertices.

Fuzzy Boundary and Interior Regions

If an area object is constructed by a boundary with N line segments, the positional uncertainty of the boundary vertices will affect a certain region around the boundary, which may be called a fuzzy boundary region, while the interior region is free of effects from positional uncertainty of the boundary vertices. The confidence region of a line segment is used to determine the fuzzy boundary region and -- in an inverse process -- the interior region of an area object. The confidence region of a line segment is the zone around this line segment that contains the true location of the line segment with a pre-defined level of confidence. The size of a confidence region is dependent on the variances of the two end points of the line segment and the confidence level. An example of a confidence region of a line segment is shown in Figure 2(a).

Shi (1994) gave the analytical derivation of the confidence region of a line segment for normally distributed end points. The confidence region of the whole boundary is the union of the confidence regions of the N line segments. The fuzzy boundary region of the area object is defined as the confidence region of the boundary line features, and the interior region of the area object is defined as the area which is obtained by subtracting the fuzzy boundary region from the total area of the object. The fuzzy boundary region of an area object is shown in Figure 2 (b).

[End Page 456]


Probability Distribution of Line Segments

The probability distribution of a line segment gives a description of how a measured line segment is distributed around the "true" location of the line segment. It is defined by the probability density function in the direction perpendicular to the line segment (a set of ID normal distributions) and the two density functions at the end points (2D normal distribution) (Shi 1994). The probability distribution of a line segment is illustrated in Figure 3 (a) based on a simplified model (triangular density function instead of the Gaussian); darker densities show higher probabilities that a measured line segment will actually be located there.

We use this probability distribution to solve the problem of quantifying the uncertainty of a point, whether or not it belongs to the object in question. Ideally, we would have to know the "true" location of the boundary line segments to determine the probability value of a point close to a given line segment. In a GIS, however, we usually have only one "measurement" of a line segment (and not several which would enable us to obtain a better estimate of the true location of the line segment), thus we can assign probability values to "area object points" only with some degree of uncertainty.

Positional Uncertainty of a Boundary Line Feature

A line feature is defined here as a feature which is composed of more than one line segment. If a line feature constitutes the boundary of an area object, it is called a boundary line feature. In determining the positional uncertainty of a boundary we can use the probability distribution of line segments, but must solve the ambiguity in the region where two line segments join. This problem is solved by using a set intersection operation based on fuzzy set theory. An example of the uncertainty of a boundary line feature is shown in Figure 3 (b). Again, a darker tone means less uncertainty.

Positional Uncertainty of an Area Object

An area object is defined as an area enclosed by a boundary line feature. The positional uncertainty of an area object is determined by that of the boundary. The positional uncertainty affects only the fuzzy boundary region and not the interior region. The uncertainty of an area object is described by the probability that a point belongs to the area object i.e., P((x,y) epsilon O) epsilon [0,1]. We can distinguish between points in the boundary region or the interior region of the object. When a point "moves" from the margin to the interior region of the area, the probability varies from 0 to 1. The probability value of a point in the boundary region is

[End Page 457]


dependent on the probability distribution of the boundary line feature and is determined by the cumulative probability function perpendicular to the boundary. Figure 3 (c) gives an example of describing positional uncertainty of an area object. The grey values represent the probability that a point at this location belongs to the object. Darker values indicate higher probabilities.

Comparison with Epsilon Band Based "Point-in-Polygon" Description

Blakemore (1984) used an epsilon band model to describe the "point- in-polygon" problem, i.e., the uncertainty of an area object enclosed by a polygon. He distinguished five relationships between a point and the area object (see Figure 3 (d)). These are: definitely in (point 5), definitely out (point 1), possibly in (point 4), possibly out (point 2) and ambiguous (point 3). Such an epsilon band-based description can provide only five different qualitative relationships between an area object and a point. Using the probability distribution of line segments, we can describe the relationships by probability values varying within [0, 1]. This provides an quantitative indicator of uncertainty and, moreover, facilitates the combination with thematic uncertainty indicators.

[End Page 458]


Statistics Describing Positional Uncertainties of Area Objects

Figure 3 (c) illustrates the spatial distribution of the probability that a point belongs to an area object. We can also characterize the positional uncertainty of the area object by computing the frequency distribution of the probability values. To this end, we choose, for instance, 10 probability classes; thus a class width of 10%: 0 - 10%, > 10 - 20%,.... > 90 - 100%. Table 1 shows the frequency distribution of the probability values inside the boundary of Figure 3 (c). Of 1638 pixels, 989 have a probability higher than 90% of belonging to the area object.

MODELLING THEMATIC UNCERTAINTIES OF A CLASSIFIED IMAGE

Thematic uncertainty in this context is referred to as the uncertainty of a classified remote sensing image. The reason for this uncertainty is that the classification is based on limited evidence. In our test remote sensing image, there are four classes: urban, water, forest and grass. In a maximum likelihood classification, a probability vector is generated for each pixel in the image. The pixel is then assigned to the class with the maximum probability. For example, a pixel with the probability vector

  [P(x  urban)=0.33, P(x  forest)=0.31, P(x  grass)=0.31,
      P(x  water)=0.5]

will be assigned to the class "urban". The other probability values will be ignored. However, we have only very weak evidence to classify this pixel as urban (the probability is only 33 %). If the maximum probability value for each pixel is retained, the certainty of the classification result can be described. Consequently, we attach the probability value P(x urban) = 0.33 to the classification result, so the user can see that this classification is very uncertain. If the whole probability vector is attached, a user may further learn that the pixel may just as well be forest or grass (both with probability of 31%).

Figure 4 (a) shows the maximum probability value of each pixel as a grey value. Darker grey values indicate the higher probability values. Figure 4 (b) is the classified image based on the maximum probability value of each pixel. Different grey levels mean different classes. To compute the frequency distribution of probability values, we again use ten intervals, 0 - 10%, > 10 - 20%,.... > 90 - 100 %. Table 2 (listed in the column SUM) is the number of pixels that were classified as urban, grass, etc; it also shows the absolute frequencies for every class, which describe the uncertainty of the classification.

Using a visualization technical such as Figure 4 (a), and description such as Table 2, the user of the classified data knows not only the total area of each class, but also how confident he or she can be in this classification. This will benefit the users in their decision making. For example, 56 of the total of 117 pixels that are classified as urban have a probability only between >20 and 30%. Thus, we can see the classification result of "urban" is rather uncertain. On the other hand, 138 out of a total 222 of pixel classified to be grass are in the probability class >90 - 100%, making this result a lot more certain.

[End Page 459]


[End Page 460]


Given the boundary layer with positional uncertainty indicators and the classified remote sensing image with thematic uncertainty indicators, an overlay operation can be used to solve the problem by using equation (2). Visualization of the result is treated in the next section. The statistical result for the combined uncertainty is given by Table 3.

Comparing Table 2 and Table 3, we can see that after the combination more pixels have a lower probability value. For example, at the interval > 90 - 100%, the number of pixels for urban are reduced from 13 to 4, for grass 138 to 57, for water from 7 to 0 and for forest from 452 to 240. On the other hand, the number of pixels with high uncertainty has increased. For example, at the interval > 10 - 20%, the number of pixels for urban has increased from 0 to 24; for grass from 0 to 9; for water, from 0 to 4; and for forest from 0 to 80. These numbers give a quantitative description on how uncertainty is increased by the combination of positional and thematic uncertainties.

[End Page 461]


VISUALIZATION OF UNCERTAINTIES USING 3D AND COLOR TECHNIQUES

A commonly used method to visualize uncertainty is using grey values. However, due to the limitation of the human eye in distinguishing grey values (normally less than 64 levels), the grey-level coding may not work very well in some cases. A method which involves three-dimensional (3D) and colour representation techniques is therefore proposed for the visualization of uncertainty in this study. (Ironically, the colour figures cannot be printed in these proceedings, so they are replaced by grey value images.)

If we only want to visualize one feature -- uncertainty itself -- a three-dimensional representation technique can be readily used. Figure 6 represents the positional uncertainty of an area object, the plane coordinates represent the location of a pixel, the third dimension (elevation) represents the uncertainty value of a pixel. The higher the elevation, the lower the uncertainty. From this figure, we can clearly see that uncertainty is increased when a point moves from the inside to the outside area of an object.

If we want to represent not only uncertainty but also the corresponding thematic value, 3D and colour techniques can be used. Figure 7 represents the classification result of an image and the probability value for each pixel. The colours (here in grey levels) represent the class types. The elevation represents the probability value. The higher the elevation, the larger the probability value of that pixel.

Similarly, Figure 8 illustrates the uncertainty after combining positional and thematic uncertainties. Again, the elevation represents the probability that a pixel belongs to a class (e.g., urban) and lies within the area (county A). By comparing Figure 7 and Figure 8, we can easily see how the combination operation further degrades the results. We can see not only the class type assigned to each pixel, but also its associated probability. For example, two pixels are located close to the center of the area are classified as water, but their probabilities are very low. This indicates that we do not have high confidence that the pixels are water. This visualization result can be directly used by both data producers and data users.

[End Page 462]


Data producers should check those areas with low probability values, since these are the areas where incorrect classification are likely to have occurred. On the other hand, a data user can give weight to those areas with large probability values and low weight to those areas with small probability values in order to reduce the risk in decision making based on spatial context.

[End Page 463]


SUMMARY

In this paper, uncertainties in multi-data-based spatial analysis were discussed, focusing on positional and thematic uncertainties and their combination in integrating GIS and remote sensing data. Obviously the outlined "S-band" model is based on several assumptions to facilitate an analytical derivation. For application, we need to know only the covariance matrices of points in the GIS and the statistical properties (mean vector, covariance matrices and a priori probability) of each class for maximum likelihood classification. We can now proceed to study more complex combinations of positional and thematic uncertainties than the simple overlay operation treated here and to also investigate modelling logical inconsistency and temporal uncertainties.

REFERENCES

Blakemore, M., 1984, Generalization and Error in Spatial Data Bases. Cartographica 21, pp.131-139.

Ehlers, M., 1994, Rectification and Registration, in: Star, J and J.E. Estes (Eds) Integration of GIS and Remote Sensing. Cambridge University Press (in press).

Shi, W. and M. Ehlers, 1993, "S-band", A Model to Describe Uncertainty of an Object in an Integrated GIS/Remote Sensing Environment. In: Sadao Fujimura, Proceedings of IGARSS'93, Tokyo, Japan.

Shi, W. and K. Tempfli, 1994, Positional Uncertainty of Line Features in GIS (to be published in the proceedings of 1994 ASPRS/ACSM).

Shi, W., 1994, Modelling Positional and Thematic Uncertainties in Integration of Remote Sensing and Geographic Information Systems (in press).

Richards, J., 1986, Remote Sensing Digital Image Analysis: An Introduction. Springer-Verlag Berlin Heidelberg New York London Paris Tokyo.

[End Page 464]