Customer Service | Training | Contact Us
Welcome!
Login
Search Options   products areas display

Which products should be considered?

All Products

ArcCAD
ArcEditor
ArcExplorer
ArcGIS Engine
ArcGIS Explorer
ArcGIS Image Server
ArcGIS Mobile
ArcGIS Server
ArcIMS
ArcInfo Desktop
ArcInfo Workstation
ArcLogistics Route
ArcPad
ArcPad Application Builder
ArcReader
ArcSDE
ArcView
ArcView 3.x
ArcWeb Services APIs
ArcWeb Toolbar for ArcGIS
Atlas GIS
BusinessMap
BusinessMap Pro
GIS Portal Toolkit
Job Tracking for ArcGIS
MapIt
Maplex
MapObjects -- Java
MapObjects -- Windows
MapObjects IMS
MapObjects LT
MapStudio
Military Overlay Editor
NetEngine
PC ARC/INFO & DAK
PLTS
RouteMap
RouteMap IMS
SDE
Tracking Server

    Remember these settings for each visit More info
You are here:

Technical Article   FAQ:  Does ArcGIS Desktop support Unicode?

Article ID: 27345
Software:  ArcGIS - ArcEditor 8.1, 8.1.2, 8.2, 8.3, 9.0, 9.1, 9.2, 9.3 ArcGIS - ArcInfo 8.0.1, 8.0.2, 8.1, 8.1.2, 8.2, 8.3, 9.0, 9.1, 9.2, 9.3 ArcGIS - ArcView 8.1, 8.1.2, 8.2, 8.3, 9.0, 9.1, 9.2, 9.3
Platforms:  Windows NT 4.0, 2000, XP

Question

Does ArcGIS Desktop support Unicode?

Answer

ArcGIS Desktop applications, such as ArcMap, are Unicode based, so they support Unicode to a certain level. The level of Unicode support depends on the data format.

Currently personal geodatabases and file geodatabases are the only data format that supports Unicode by default. It is even possible to store and display characters of multiple languages in a single personal or file geodatabase. If the characters are not displayed correctly, verify that the font is set to Unicode font, such as Tahoma.

Shapefiles also support Unicode, but not by default. To enable Unicode support in shapefiles, a registry key needs to be modified. -show me-
Summary
** This issue is addressed in ArcGIS 8.2 **

At 8.2, ESRI implemented a CODE PAGE CONVERSION functionality in ArcGIS Desktop (ArcMap, ArcCatalog, and ArcToolbox), that allows the Desktop applications to read and write shapefile and dBASE file encoded in various code pages. The code page conversion functionality for dBASE file (called dbfDefault) is activated by specifying a code page value in the system registry. This is very similar to the &CODEPAGE function used in ArcInfo Workstation.

What does the dbfDefault setting do?

By setting a code page value in the system registry, users are able to read and write shapefile and dBASE file encoded in that code page. For example, users can export a shapefile encoded in OEM by setting the code page registry value to OEM. Users can also read shapefiles and dBASE files that do not have the code page information stored in the file as long as users know which code page the file is encoded in.

Why set the dbfDefault?

When opening a shapefile and dBASE file in ArcGIS Desktop, the Desktop programs look at the Language Driver ID (LDID) in the header of a dBASE file, or an associated *.CPG file, which are both used to define the code page, to determine the code page of the file that is read. Based on the code page information it retrieves, ArcGIS Desktop displays the strings accordingly by performing a code page conversion if it is necessary, but if a dBASE file lacks an LDID or a *.CPG file, it assumes that the file is encoded in the Windows (ANSI/Multi-byte) code page. If the Desktop programs read a dBASE file encoded in OEM, but the file doesn't contain any code page information, or, doesn't have either an LDID or a *.CPG file, the characters do not display correctly. This is because the Desktop programs think the file is encoded in the ANSI code page since it can not find the code page information, while the file is actually being encode in OEM. This means ArcGIS treats the OEM file as being encoded in ANSI, which causes an incorrect display of 8-bit characters stored in the file. Most shapefiles and dBASE files should have the code page information stored in the file, but some programs such as Microsoft Access 2000 and Excel 2000 encode dBASE file in OEM, but do not include the code page information in the LDID, so ArcGIS does not read the files correctly. To avoid this kind of problem, users can set the dbfDefault to the appropriate code page before opening a file that lacks the code page information.

How does the dbfDefault work?

The dbfDefault setting in the system registry defines the code page to which a shapefile and dBASE file export. The code page of a shapefile and dBASE file that are created in ArcGIS Desktop are encoded in the code page defined by the system registry's dbfDefault value. For example, if the dbfDefault is set to OEM, shapefile and dBASE files created in ArcMap, ArcCatalog, and ArcToolbox are always encoded in OEM; alternatively, if the dbfDefault is set to ANSI, shapefile and dBASE files are always encoded in ANSI.

It is important to note that there is one exception to this: shapefiles exported from Coverages in ArcCatalog and ArcToolbox in languages other than Spanish and Arabic are always be encoded in OEM, regardless of the dbfDefault setting. This is because 'Coverage to Shapefile' in ArcToolbox uses the functionality of ArcInfo Workstation, which are defined layers, that run on DOS, so the output file is always encoded in the OEM code page, or the DOS code page. Shapefiles exported from Coverages in ArcCatalog and ArcToolbox in Spanish and Arabic will be encoded in ANSI.

- Shapefile exported from a Coverage in ArcCatalog and ArcToolbox are always in the OEM code page (except for Spanish).

The same logic applies to shapefile and dBASE files that get read into ArcGIS Desktop: if a shapefile or a dBASE file lacks an LDID or a *.cpg file, ArcGIS assumes the file to be encoded in the code page defined by dbfDefault. For example, if the dbfDefault value is set to OEM and a dBASE file lacks both an LDID and a *.cpg file, ArcGIS Desktop will assume that the file is encode in OEM, and therefore do a code page conversion to display the 8-bit characters in ArcMap and ArcCatalog (since both of the applications are Windows programs that use the ANSI code page to display strings).


  As long as users have the dbfDefault value set to a certain code page, all shapefiles and dBASE files exported in ArcGIS will be encoded in that code page; and all shapefiles and dBASE files that do not have the code page information will be assumed to be in that code page. Therefore it is important to set the dbfDefault back to its default value (no value) as soon as the task is completed.



4. What are the programs that dbfDefault can be used with?

ArcGIS Desktop is the only program that is affected by the dbfDefault setting; other programs such as ArcInfo Workstation and ArcView 3.x, or other code page settings such as the &CODEPAGE function used in ArcInfo Workstation and the Code Page Profile used in ArcView 3.x, are not affected by this.

In ArcInfo Workstation,

- ARCSHAPE with &CODEPAGE OEM always creates a shapefile in OEM
- ARCSHAPE with &CODEPAGE ANSI always creates a shapefile in ANSI
- INFODBASE with &CODEPAGE OEM always creates a dBASE file in OEM
- INFODBASE with &CODEPAGE ANSI always creates a dBASE file in ANSI

In ArcView 3.x,

- Shapefile and dBASE file are always saved in the ANSI code page.

5. What are the data formats that get affected by dbfDefault ?

Shapefile and dBASE file are the only data formats that can be used by the dbfDefault setting to specify the code page. Other data format such as Coverage, Personal Geodatabase are not affected by the dbfDefault setting.

In ArcGIS Desktop (regardless of the dbfDefault setting),

- Personal Geodatabase is always saved in Unicode.
- Personal Geodatabase Table is always saved in Unicode.
- Coverage is always saved in the ISO code page.
- INFO file is always saved in the ISO code page.
- Interchange file is always saved in the ANSI code page.
- Text file is always saved in the ANSI code page.
Procedure
How can I set the dbfDefault value in the system registry?


  1. Add two keys called Common and CodePage in the system registry

     WARNING: The instructions below include making changes to essential parts of your operating system. It is recommended that you backup your operating system and files, including the registry, before proceeding. Consult with a qualified computer systems professional, if necessary.

    ESRI cannot guarantee results from incorrect modifications while following these instructions; therefore, use caution and proceed at your own risk.

    To add a key,

    a. Open Registry Editor (click Start, click Run, type regedit, and click OK.).
    b. In the registry tree (on the left), go to 'My Computer\HKEY_CURRENT_USER\Software', and click the registry key ESRI.
    c. Add a new key called Common (on the Edit menu, point to New, select Key, and type the name "Common" and press ENTER).
    d. Click the registry key just created (Common) and add a new key called CodePage.
  2. Add a new string value dbfDefault to the CodePage key

    To add a string value,

    a. Click the key CodePage.
    b. On the Edit menu, point to New, and select String Value.
    c. Type "dbfDefault" for the new value, and press ENTER.

    The new CodePage key should look like this:

    Screenshot image of dbfDefault in system registry [O-Image] Screenshot of dbfDefault in

  3. Enter a code page value

    a. Select the entry just added; it is important that Default is selected and not (Default).
    b. On the Edit menu, click Modify.
    c. In Value data, type the new code page value, and click OK.

    Following is a list of code page values that are supported (not case sensitive).

    OEM Code Page Value:
    OEM, 437, 708, 720, 737, 775, 850, 852, 855, 857, 860, 861, 862, 863, 864, 865, 866, 869, 932, 936, 950

    ANSI Code Page Value:
    ANSI, 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, Big5, SJIS

    ISO Code Page Value:
    ISO, 88591, 88592, 88593, 88594, 88595, 88596, 88597, 88598, 88599, 885910, 885913, 885915, EUC

    Unicode Value:
    UTF-8

     Shapefile can now be stored in UTF-8. However, Shapefile encoded in UTF-8 is only recognized in ArcGIS Desktop.



ArcSDE geodatabases have limited support for Unicode. -show me-
Question
Does ArcSDE support Unicode?
Answer
ArcSDE 8.x and prior

No; Unicode is not supported.

ArcSDE 9.0 and 9.1

Full Unicode support is not available.

Some Unicode support is available at ArcSDE 9.0 and 9.1 when using Oracle or DB2 databases with character set UTF8, but with limitations. For example, it is not possible to load and display attributes in multiple languages in the same feature class.

For multilingual Unicode support, ArcSDE 9.0 and 9.1 support NCHAR and NVARCHAR columns in the database. NCHAR and NVARCHAR use UTF16 character encoding. To access and update these columns, use SDE C API calls, SE_stream_set_nstring(), and SE_stream_get_nstring().

 ArcGIS Desktop has not implemented support for NCHAR and NVARCHAR; Therefore, when using ArcGIS Desktop with ArcSDE, there is no support for NCHAR and NVARCHAR.

The following information describes database-specific Unicode limitations in ArcSDE 9.0 and 9.1:

▪ Oracle

If an Oracle database is created with character set UTF8, CHAR and VARCHAR can store characters in UTF8. However, only one language can be loaded or displayed at a time. To load or display different languages, set the appropriate NLS_LANG value for each language before attempting to load or display each language, one at a time.

▪ SQL Server

CHAR and VARCHAR are non-Unicode data types in SQL Server. Unicode data is stored using the NCHAR and NVARCHAR data types in SQL Server. However, as stated above, ArcGIS Desktop does not support NCHAR and NVARCHAR, so when using ArcGIS Desktop with ArcSDE, there is no support for NCHAR and NVARCHAR.

▪ DB2

DB2 has the same limitation as Oracle databases.

ArcSDE 9.2 and later releases

Unicode is fully supported.


Coverage and other legacy data formats do not support Unicode. Use geodatabases for these formats to store and display Unicode data.

Created: 7/28/2004
Last Modified: 6/11/2009

This website's graphical display is now viewable only with W3C standards-compliant browsers, but the content is accessible to all browsers and Internet devices. View our supported browser matrix for more information on our website display.