NETWORK ACCESS TO GEOGRAPHIC NAMES:
DEFENSE MAPPING AGENCY PROTOTYPE TO THE INFORMATION SUPER HIGHWAY

Laura A. Moore
M. Roy Matsumoto
Robert D. Holderfield
Defense Mapping Agency Aerospace Center
St. Louis, MO 63118-3399

ABSTRACT

The Defense Mapping Agency supports foreign geographic names policies and standardization within the U. S. Defense and Intelligence Community through the Geographic Names Processing System (GNPS). The GNPS holds and manipulates the Foreign Names Committee's Geographic Names Data Base which is currently populated with 4.5 million non-U. S. geographic names. To meet the need of access to this names data, the GNPS has been expanded with a stand-alone UNIX based SUN Server to provide on-line query access through a MILNET link. This paper provides a background on geographic names data, and considers potential for the information super highway community. The technical obstacles that challenge the accomplishment of this program are also discussed.

[End Page 60]


INTRODUCTION

The United States Board on Geographic Names (BGN) is the official United States body created in 1890 to provide for uniform usage of geographic names throughout the [U.S.] Federal Government. Established in its present form by a Public Law enacted in 1947, BGN operates through several committees to standardize names of geographic features in the United States, foreign areas, Antarctica, and undersea areas. According to the law, the Board shares its responsibility with the Secretary of the Interior. Essential to the Board's function of standardizing names is the promulgation of names information. Traditionally, names of geographic features outside the United States have been published in a series of BGN gazetteers, and the needs of a wide range of users - both within the [U.S.] Federal Government and elsewhere - have been well served by these publications. The responsibility for producing and distributing the gazetteers [sic, geographic names data] has been assigned to the Defense Mapping Agency [DMA] as part of its overall mission of supporting foreign-area scientific studies for the [U. S.] Federal Government. The production of gazetteers is based on the work of linguists, geographers, and cartographers, who use a variety of source materials. Wherever possible, gazetteer production is carried out with the cooperation of the concerned country."

So states the preface of DMA Gazetteer publications available through the Government Printing Office and DMA. However, the need to provide wider access to the U. S. recognized geographic names has been spurred with the development of the Geographic Names Processing System (GNPS) and its link to the outside world, the GEOnet Names Server (GNS) which is available on Milnet to approved users. The Defense Mapping Agency Hydrographic Topographic Center (DMAHTC), located in Bethesda Maryland has, until recently, carried the responsibility to answer thousands of geographic names inquiries by accessing a database of 4.2 million 4"x 6" note cards. The Geographic Names Processing System has automated that data base and is expected to accommodate approximately 35 million names data records before it is considered appropriately populated for DMA mapping requirements. The production GNPS was developed by GDE Systems, Inc.

THE GNDB

The Geographic Names Data Base (GNDB), for which the GNPS was developed, is comprised of the geographic coordinates, historical references, grid references, and geographic name forms (native, conventional, variant, and short form) to accommodate 70 different languages portrayed (transliterated) in Roman characters incorporating 63 different diacritic marks and special characters. Thus the GNDB

[End Page 61]


meets the legal requirements for portraying Board on Geographic Names approved names. Also all diacritic/letter combinations and special characters are stored within, and appropriately displayed in output from the GNPS. The extreme example of language romanization is Vietnamese names which require 122 characters: 54 letter/diacritic combinations in both capital and small letter cases and 14 special characters all of which are used to define the correct pronunciation of each geographic, or feature, name. While many computers have programs available to print and display various languages, until now no software packages were available to accommodate such an expanse of languages concurrently in one database.

The GNPS diacritic requirements are met by an 8 bit word mapping database structure. This means a font of 8-bit words can be used to define 256 characters within the database, which allows for the retention of most of the already available normal characters found on the QWERTY keyboard. In other words, such characters as those found in the Shift mode of the number row (e.g. ! @ # $, etc.) were not sacrificed to support the diacritic requirement. The GNPS design uses only roman characters with a two-level softcopy keyboard. GDE Systems Inc., with consultation from SYSTRAN, established language regions to accommodate the widest number of similar languages with common diacritics/letter combinations and special characters. The individual language softcopy keyboards were designed to meet the full range of diacritics and special characters prescribed by the specific country or language transliteration rules. Keep in mind that a diacritic/letter combination is not necessarily used comparably between countries and languages to define the same sound or meaning being transliterated. The next step in the evolution of the GNPS will be implementing a stable character set that can accommodate non-roman alphabets like Arabic, Cyrillic and Chinese within a single data base.

On occasion the rules of transliteration for a language will change. The GNDB, being also a historical reference, will maintain the previously used version of a name and reference it to the current approved name. This form of the name then becomes a variant. The GNDB also is sensitive to the fact that a particular feature may be named differently by independent countries or languages. A prime example of this is the Persian Gulf which the Arabic nations prefer to call Arab Gulf or Arabian Gulf and which the United Kingdom has decided to call The Gulf.

THE DMA GEONET NAMES SERVER INITIATIVE

The 1990's have forced the defense industries to rethink objectives, and forces DMA to consider its application to the non-Cold War business environment. To that end DMA follows the Presidential Directive to make its information available on the

[End Page 62]


"information super highway". To meet the expanding need for wider access to U.S. recognized geographic names, the geographic names data has been made accessible on the Milnet - the military version of the Internet. This is provided through a separate GNPS sub-system, the GEOnet Names Server or GNS.

The GNS accommodates user requests for information through the following path. Geographic Names Data is extracted from the GNDB and downloaded to an 8mm tape. In this form it is transferred to the GNS and accessed by the Milnet/Internet customer via the graphical user interface (GUI) Mosaic. The user's geographic names inquiry is translated by the GNS interface software and is submitted to the GNS database. GNS then performs a retrieval from the GNS database and passes the response back to the Mosaic GUI where the Internet customer can read the names data response. Mosaic is a freeware GUI package, but has been modified for a separate release through DMA to meet the special character and diacritic requirements of the Geographic Names Data Base set. As a consequence, the Internet user must import the GNS version of Mosaic in order to display names data with diacritics. The standard Mosaic GUI can be used to extract names data, but without special characters.

DMA is developing the policies by which data may be obtained over the information super highway. The business as normal operations which by heritage limits the customers who have access to DMA data must be reassessed with consideration as to whether to make DMA an information source with a saleable product available to any and all network; a public service by which only the costs of data transfer, or data maintenance and overhead is assessed to the customer; or a less known entity which is contracted by the government to ration out accesses.

THE GEOGRAPHIC NAMES CUSTOMER

The majority of names data customers have been users of DMA maps and charts. Consequently, distribution of DMA geographic names has been primarily in hardcopy form. Softcopy production like Digital Terrain Elevation Data (DTED) generated by DMA is provided to the customer largely without reference to a feature name. However, newer products like the CADRG (Compressed ARC Digitized Raster Graphic) and ADRG CD-ROM products, include names data by virtue of its portrayal on the original hardcopy chart that has been digitized and copied to the CD-ROM. Other digital products incorporate geographic names data within the data set.

The intelligence community has also been an active user of geographic names data through phone calls to the DMAHTC names office. However, access has largely been limited to the U.S. government and military.

[End Page 63]


Today's geographic names data customer community is expanding. The data is available via the Milnet in digital form which makes the published gazetteer of 1990 now accessible as a queriable data base. The GNPS is marketed by GDE Systems, Inc. and is available as a system for data collection purposes. The public sector has also expressed interest in the names data set; software companies including Microsoft have procured the data from DMA.

The full range of applications for geographic names data bases are not clearly defined, though there are sure to be many uses by genealogists, the media, publishers, etc. well beyond the scope of map makers and the intelligence and academic communities. DMA will be collecting data on the GNS user in order to understand how the customer community is applying the names data. The collected data will be used to determine future amendments to the names data base and GNS functions

CHALLENGES MET

The GNPS/GNDB system has been designed to meet all BGN requirements. GNPS provides the means for a geographic names relational data base. GNPS data can be plotted in a scaled graphic representation of geographic placement and feature type which supports and becomes a part of DMA products. The GNS design has accomplished Milnet and Internet connectivity with network users who have successfully displayed geographic names data via all of the supported platforms and adapted Mosaic graphical user interface (GUI). This establishes the Geonet Names Server as the premier DMA network capability. At delivery to the DMA, the GNS design supports the system platforms and output as outlined in Table 1.

             SUN/UNIX     DOS/WINDOW     MACINTOSH
  MOSAIC       V2.4          V2.0           V2.0
  VERSION                   ALPHA 6       ALPHA 6

  NAMES     Display-Yes   Display-Yes   Display-Yes
  W/OUT      Print-Yes     Print-Yes     Print-Yes
  DIACRITICS

  NAMES Display - Yes Display - Yes Display - No
  WITH Print - No Print - Not Verified Print - No
  DIACRITICS

                       Table 1

THE CHALLENGES AHEAD

The expansion of the GNDB and GNS data base into 16-bit (or 32-bit) words would allow the GNPS to have a truly international responsiveness to the information customer. The means to provide this flexibility and applicability to meet DMA requirements will also serve to prove the willingness of DMA to adapt to world-wide customers.

An appropriate enhancement to the GNS would allow geospatial selection for names data retrieval from the GNS; this application would facilitate user interface without requiring they reference an atlas or map for geographic coordinates used on inquiry parameters.

One challenge to the completion of the GNS and user interface lay in the printing of diacritics. The user's printer must be "trained" to interpret the Bitstream fonts used by GNS and output them with the called-for diacritic/ special character. Not all personal computer platforms currently support the printing requirements of the BGN approved names, though this is an industry goal, and certainly a priority for the GNS.

The GNS provides access to a set of data that is inherently diverse. The service that GNS must supply with the names data is references or clarifications as to the use of those required diacritics and special characters - a glossary of pronunciation. Diacritics are not used for the same phonetics by different languages and so a relational translation of the diacritics and special characters to the language(s) being accessed would be appropriate.

Another expansion of the GNS, or a sister system, would accommodate the geographic names data sets of both the DMA Geographic Names data base and the U.S. Geological Survey (USGS), which is responsible for the maintenance of the U.S domestic placenames file. The GNS Homepage currently directs would-be users of U.S. domestic names to the USGS homepage by means of a hypertext call.

SUMMARY

With the accomplishment of the Geographic Names Processing System and the GEOnet Names Server, DMA has established a means to data base and out-process BGN foreign names data. DMA can meet the needs of current geographic names customers, both internal and external to DMA in a more comprehensive and expeditious manner. At the same time the GNS will provide DMA with a wider customer base through accesses to the Milnet and the Internet.

[end Page 65]