Knowledge Base - Technical Articles
HowTo: Read and write shapefile and dBASE files encoded in various code pages
|Software:||ArcGIS - ArcEditor 9.2, 9.3, 9.3.1, 10 ArcGIS - ArcInfo 9.2, 9.3, 9.3.1, 10 ArcGIS - ArcView 9.2, 9.3, 9.3.1, 10 ArcGIS Server (10.0 and prior) 10 ArcGIS for Desktop Advanced 10.1, 10.2, 10.2.1 ArcGIS for Desktop Standard 10.1, 10.2, 10.2.1 ArcGIS for Desktop Basic 10.1, 10.2, 10.2.1 ArcGIS for Server 10.2|
|Platforms:||Windows XP, Server 2003, Vista, Server 2008, Windows 7, Windows 8, Server 2012, Server 2008 R2|
Prior to ArcGIS 10.2.1, the following techniques can be used to set the desired code page behavior.
At ArcGIS 10.2.1, the default sets the code page to UTF-8 (UNICODE) in the shapefile (.DBF). This is constant with current internationalization practices and should ensure the data is readable.
What does the 'dbfDefault' setting do?
By setting a code page value in the system registry, users are able to read and write shapefile and dBASE files encoded in that code page. For example, users can export a shapefile encoded in OEM by setting the code page registry value to OEM. Users can also read shapefiles and dBASE files that do not have the code page information stored in the file as long as users know which code page the file is encoded in.
Why set the 'dbfDefault'?
When opening a shapefile and dBASE file in ArcGIS Desktop, the Desktop programs look at the Language Driver ID (LDID) in the header of a dBASE file, or an associated *.CPG file, which are both used to define the code page and help determine the code page of the file that is read. Based on the code page information it retrieves, ArcGIS for Desktop displays the strings accordingly by performing a code page conversion if it is necessary. If a dBASE file lacks an LDID or a .CPG file, it assumes the file is encoded in the Windows (ANSI/Multi-byte) code page.
If the Desktop programs read a dBASE file encoded in OEM but the file does not contain any code page information or does not have an LDID or a .CPG file, the characters do not display correctly. This is because the Desktop programs assume the file is encoded in the ANSI code page since it cannot find the code page information, while the file is actually being encoded in OEM. This means ArcGIS treats the OEM file as being encoded in ANSI, which causes an incorrect display of 8-bit characters stored in the file.
Most shapefiles and dBASE files should have the code page information stored in the file. Some programs, such as Microsoft Access 2000 and Excel 2000, encode dBASE files in OEM but do not include the code page information in the LDID, so ArcGIS does not read the files correctly. To avoid this problem, users can set the dbfDefault to the appropriate code page before opening a file that lacks the code page information.
How does the 'dbfDefault' work?
The 'dbfDefault' setting in the system registry defines the code page to which a shapefile and dBASE file export. The code page of a shapefile and dBASE file that are created in ArcGIS Desktop are encoded to the code page defined by the system registry's 'dbfDefault' value. For example, if 'dbfDefault' is set to OEM, shapefile and dBASE files created in ArcMap, ArcCatalog, and ArcToolbox are encoded in OEM. Alternatively, if 'dbfDefault' is set to ANSI, shapefile and dBASE files are encoded in ANSI.
It is important to note that there is one exception to this: shapefiles exported from coverages in ArcCatalog and ArcToolbox in languages other than Spanish and Arabic are encoded in OEM, regardless of the dbfDefault setting. This is because 'Coverage to Shapefile' in ArcToolbox uses the functionality of ArcInfo Workstation, which are defined layers that run on DOS, so the output file is always encoded in the OEM code page or the DOS code page. Shapefiles exported from coverages in ArcCatalog and ArcToolbox in Spanish and Arabic are encoded in ANSI.
▪ Shapefiles exported from a coverage in ArcCatalog and ArcToolbox are always in the OEM code page (except for Spanish).
The same logic applies to shapefile and dBASE files that are read into ArcGIS for Desktop; if a shapefile or a dBASE file lacks an LDID or a .cpg file, ArcGIS assumes the file to be encoded is in the code page defined by dbfDefault. For example, if the dbfDefault value is set to OEM and a dBASE file lacks both an LDID and a .cpg file, ArcGIS for Desktop assumes the file is encoded in OEM, and therefore performs a code page conversion to display the 8-bit characters in ArcMap and ArcCatalog (since both of the applications are Windows programs that use the ANSI code page to display strings).
What are the programs that dbfDefault can be used with?
ArcGIS for Desktop is the only program that is affected by the dbfDefault setting. Other programs, such as ArcInfo Workstation and ArcView 3.x, or other code page settings such as the '&CODEPAGE' function used in ArcInfo Workstation and the Code Page Profile used in ArcView 3.x, are not affected.
In ArcInfo Workstation,
▪ ARCSHAPE with &CODEPAGE OEM creates a shapefile in OEM
▪ ARCSHAPE with &CODEPAGE ANSI creates a shapefile in ANSI
▪ INFODBASE with &CODEPAGE OEM creates a dBASE file in OEM
▪ INFODBASE with &CODEPAGE ANSI creates a dBASE file in ANSI
In ArcView 3.x,
▪ Shapefile and dBASE files are saved in the ANSI code page.
What are the data formats that are affected by dbfDefault?
Shapefile and dBASE files are the only data formats that can be used by the dbfDefault setting to specify the code page. Other data formats, such as coverage and personal geodatabase, are not affected by the dbfDefault setting.
In ArcGIS for Desktop (regardless of the dbfDefault setting),
▪ Personal geodatabases are saved in Unicode
▪ Personal geodatabase tables are saved in Unicode
▪ Coverages are saved in the ISO code page
▪ INFO files are saved in the ISO code page
▪ Interchange files are saved in the ANSI code page
▪ Text files are saved in the ANSI code page
1. Add two keys called 'Common' and 'CodePage' in the system registry.
Esri cannot guarantee results from incorrect modifications while following these instructions; therefore, use caution and proceed at your own risk.
a. Open the Registry Editor:
Click Start > Run, type 'regedit', and click OK.
b. In the registry tree (in the left pane of the registry window), go to 'My Computer\HKEY_CURRENT_USER\Software\ESRI', and click the registry key, 'Desktop 10.x'. (For version 9.3.1 and earlier versions, go to 'My Computer\HKEY_CURRENT_USER\Software', and click the registry key ESRI.)
c. Add a new key called 'Common' (on the Edit menu:
Navigate to New, select Key, type the name "Common", and press ENTER).
d. Click the registry key just created (Common), and add a new key called 'CodePage'.
2. Add a new string value, 'dbfDefault', to the CodePage key.
a. Click the key CodePage.
b. On the Edit menu, navigate to New, and select 'String Value'.
c. Type "dbfDefault" for the new value, and press ENTER.
The new CodePage key should look like this:
3. Enter a code page value.
a. Select the entry just added; it is important that dbfDefault is selected and not (Default).
b. On the Edit menu, click Modify.
c. In Value data, type the new code page value, and click OK.
The following is a list of code page identifiers that are supported (these are not case-sensitive).
OEM Code Page Identifiers
437 - United States
708 - Arabic (ASMO 708)
720 - Arabic (Transparent ASMO), Arabic (DOS)
737 - Greek, Greek (DOS)
775 - Baltic, Baltic (DOS)
850 - Multi-lingual Latin 1, Western European (DOS)
852 - Latin 2, Central European (DOS)
855 - Cyrillic
857 - Turkish, Turkish (DOS)
860 - Portuguese, Portuguese (DOS)
861 - Icelandic, Icelandic (DOS)
862 - Hebrew, Hebrew (DOS)
863 - French Canadian, French Canadian (DOS)
864 - Arabic, Arabic (864)
865 - Nordic, Nordic (DOS)
866 - Russian, Cyrillic (DOS)
869 - Modern Greek, Modern Greek (DOS)
932 - Japanese, Japanese (Shift-JIS)
936 - Chinese (simplified): People's Republic of China, Singapore
949 - Korean (Unified Hangul Code)
950 - Traditional Chinese: Taiwan, Hong Kong, People's Republic of China
ALARABI - Sets the code page to 448
ANSI Code Page Identifiers
1250 - Central European
1251 - Cyrillic
1252 - Western European
1253 - Greek
1254 - Turkish
1255 - Hebrew
1256 - Arabic
1257 - Baltic languages
1258 - Vietnamese
Big5 - Chinese: Taiwan, Hong Kong, Macau
SJIS - Japanese (Sets the code page to 932)
ISO Code Page Identifiers
88591 - Latin 1: Western European
88592 - Latin 2: Central and Eastern European
88593 - Latin 3: Southern European
88594 - Latin 4: Northern European
88595 - Cyrillic
88596 - Arabic
88597 - Greek
88598 - Hebrew
88599 - Latin 5: Turkish
885910 - Latin 6: Nordic
885911 - Thai
885913 - Lithuanian
885915 - Latin 9: Western European (Upgraded from Latin 1)
UTF-8 - Sets the code page to 65001
UTF8 - Sets the code page to 65001
Alternatively, use the following batch file to modify the Windows registry.
IF "%1"=="" GOTO :EOF
reg add HKEY_CURRENT_USER\Software\ESRI\Desktop10.3\Common\CodePage /v dbfDefault /t REG_SZ /d %1 /f
To execute the batch file, enter the following line.
Last Modified: 4/14/2014