- Support Home >
- Knowledge Base >
- Technical Articles >
- Article Detail
FAQ: Does ArcGIS Desktop support Unicode?
| Article ID: | 27345 |
|---|---|
| Software: | ArcGIS - ArcEditor 8.1, 8.1.2, 8.2, 8.3, 9.0, 9.1, 9.2, 9.3 ArcGIS - ArcInfo 8.0.1, 8.0.2, 8.1, 8.1.2, 8.2, 8.3, 9.0, 9.1, 9.2, 9.3 ArcGIS - ArcView 8.1, 8.1.2, 8.2, 8.3, 9.0, 9.1, 9.2, 9.3 |
| Platforms: | Windows NT 4.0, 2000, XP |
Question
Answer
Currently personal geodatabases and file geodatabases are the only data format that supports Unicode by default. It is even possible to store and display characters of multiple languages in a single personal or file geodatabase. If the characters are not displayed correctly, verify that the font is set to Unicode font, such as Tahoma.
Shapefiles also support Unicode, but not by default. To enable Unicode support in shapefiles, a registry key needs to be modified.
| Summary |
** This issue is addressed in ArcGIS 8.2 **
At 8.2, ESRI implemented a CODE PAGE CONVERSION functionality in ArcGIS Desktop (ArcMap, ArcCatalog, and ArcToolbox), that allows the Desktop applications to read and write shapefile and dBASE file encoded in various code pages. The code page conversion functionality for dBASE file (called dbfDefault) is activated by specifying a code page value in the system registry. This is very similar to the &CODEPAGE function used in ArcInfo Workstation. What does the dbfDefault setting do? By setting a code page value in the system registry, users are able to read and write shapefile and dBASE file encoded in that code page. For example, users can export a shapefile encoded in OEM by setting the code page registry value to OEM. Users can also read shapefiles and dBASE files that do not have the code page information stored in the file as long as users know which code page the file is encoded in. Why set the dbfDefault? When opening a shapefile and dBASE file in ArcGIS Desktop, the Desktop programs look at the Language Driver ID (LDID) in the header of a dBASE file, or an associated *.CPG file, which are both used to define the code page, to determine the code page of the file that is read. Based on the code page information it retrieves, ArcGIS Desktop displays the strings accordingly by performing a code page conversion if it is necessary, but if a dBASE file lacks an LDID or a *.CPG file, it assumes that the file is encoded in the Windows (ANSI/Multi-byte) code page. If the Desktop programs read a dBASE file encoded in OEM, but the file doesn't contain any code page information, or, doesn't have either an LDID or a *.CPG file, the characters do not display correctly. This is because the Desktop programs think the file is encoded in the ANSI code page since it can not find the code page information, while the file is actually being encode in OEM. This means ArcGIS treats the OEM file as being encoded in ANSI, which causes an incorrect display of 8-bit characters stored in the file. Most shapefiles and dBASE files should have the code page information stored in the file, but some programs such as Microsoft Access 2000 and Excel 2000 encode dBASE file in OEM, but do not include the code page information in the LDID, so ArcGIS does not read the files correctly. To avoid this kind of problem, users can set the dbfDefault to the appropriate code page before opening a file that lacks the code page information. How does the dbfDefault work? The dbfDefault setting in the system registry defines the code page to which a shapefile and dBASE file export. The code page of a shapefile and dBASE file that are created in ArcGIS Desktop are encoded in the code page defined by the system registry's dbfDefault value. For example, if the dbfDefault is set to OEM, shapefile and dBASE files created in ArcMap, ArcCatalog, and ArcToolbox are always encoded in OEM; alternatively, if the dbfDefault is set to ANSI, shapefile and dBASE files are always encoded in ANSI. It is important to note that there is one exception to this: shapefiles exported from Coverages in ArcCatalog and ArcToolbox in languages other than Spanish and Arabic are always be encoded in OEM, regardless of the dbfDefault setting. This is because 'Coverage to Shapefile' in ArcToolbox uses the functionality of ArcInfo Workstation, which are defined layers, that run on DOS, so the output file is always encoded in the OEM code page, or the DOS code page. Shapefiles exported from Coverages in ArcCatalog and ArcToolbox in Spanish and Arabic will be encoded in ANSI. - Shapefile exported from a Coverage in ArcCatalog and ArcToolbox are always in the OEM code page (except for Spanish). The same logic applies to shapefile and dBASE files that get read into ArcGIS Desktop: if a shapefile or a dBASE file lacks an LDID or a *.cpg file, ArcGIS assumes the file to be encoded in the code page defined by dbfDefault. For example, if the dbfDefault value is set to OEM and a dBASE file lacks both an LDID and a *.cpg file, ArcGIS Desktop will assume that the file is encode in OEM, and therefore do a code page conversion to display the 8-bit characters in ArcMap and ArcCatalog (since both of the applications are Windows programs that use the ANSI code page to display strings). 4. What are the programs that dbfDefault can be used with? ArcGIS Desktop is the only program that is affected by the dbfDefault setting; other programs such as ArcInfo Workstation and ArcView 3.x, or other code page settings such as the &CODEPAGE function used in ArcInfo Workstation and the Code Page Profile used in ArcView 3.x, are not affected by this. In ArcInfo Workstation, - ARCSHAPE with &CODEPAGE OEM always creates a shapefile in OEM - ARCSHAPE with &CODEPAGE ANSI always creates a shapefile in ANSI - INFODBASE with &CODEPAGE OEM always creates a dBASE file in OEM - INFODBASE with &CODEPAGE ANSI always creates a dBASE file in ANSI In ArcView 3.x, - Shapefile and dBASE file are always saved in the ANSI code page. 5. What are the data formats that get affected by dbfDefault ? Shapefile and dBASE file are the only data formats that can be used by the dbfDefault setting to specify the code page. Other data format such as Coverage, Personal Geodatabase are not affected by the dbfDefault setting. In ArcGIS Desktop (regardless of the dbfDefault setting), - Personal Geodatabase is always saved in Unicode. - Personal Geodatabase Table is always saved in Unicode. - Coverage is always saved in the ISO code page. - INFO file is always saved in the ISO code page. - Interchange file is always saved in the ANSI code page. - Text file is always saved in the ANSI code page. |
| Procedure |
How can I set the dbfDefault value in the system registry?
|
ArcSDE geodatabases have limited support for Unicode.
| Question |
Does ArcSDE support Unicode? |
| Answer |
ArcSDE 8.x and prior
No; Unicode is not supported. ArcSDE 9.0 and 9.1 Full Unicode support is not available. Some Unicode support is available at ArcSDE 9.0 and 9.1 when using Oracle or DB2 databases with character set UTF8, but with limitations. For example, it is not possible to load and display attributes in multiple languages in the same feature class. For multilingual Unicode support, ArcSDE 9.0 and 9.1 support NCHAR and NVARCHAR columns in the database. NCHAR and NVARCHAR use UTF16 character encoding. To access and update these columns, use SDE C API calls, SE_stream_set_nstring(), and SE_stream_get_nstring(). The following information describes database-specific Unicode limitations in ArcSDE 9.0 and 9.1: ▪ Oracle If an Oracle database is created with character set UTF8, CHAR and VARCHAR can store characters in UTF8. However, only one language can be loaded or displayed at a time. To load or display different languages, set the appropriate NLS_LANG value for each language before attempting to load or display each language, one at a time. ▪ SQL Server CHAR and VARCHAR are non-Unicode data types in SQL Server. Unicode data is stored using the NCHAR and NVARCHAR data types in SQL Server. However, as stated above, ArcGIS Desktop does not support NCHAR and NVARCHAR, so when using ArcGIS Desktop with ArcSDE, there is no support for NCHAR and NVARCHAR. ▪ DB2 DB2 has the same limitation as Oracle databases. ArcSDE 9.2 and later releases Unicode is fully supported. |
Coverage and other legacy data formats do not support Unicode. Use geodatabases for these formats to store and display Unicode data.
Created: 7/28/2004
Last Modified: 6/11/2009