English

How To: Read and write shapefile and dBASE files encoded in various code pages

Summary

Esri has implemented a 'CODE PAGE CONVERSION' functionality in ArcGIS Desktop applications (ArcMap, ArcCatalog, and ArcGIS Pro) that allows the Desktop applications to read and write shapefile and dBASE files encoded in various code pages. The code page conversion functionality for dBASE files (called 'dbfDefault') is activated by specifying a code page value in the system registry. A reference to a code page is included in the header of the DBF file. The default code page in a shapefile (.DBF) is set to UTF-8 (UNICODE). This is the default for current internationalization practices.

 

FAQs

What does the dbfDefault setting do?

By setting a code page value in the system registry, users are able to read and write shapefile and dBASE files encoded in that code page. For example, users can export a shapefile encoded in OEM by setting the code page registry value to OEM. Users can also read shapefiles and dBASE files that do not have the code page information stored in the file as long as users know which code page the file is encoded in.

Why set the dbfDefault?
When opening a shapefile and dBASE file in ArcMap, ArcCatalog, and ArcGIS Pro, the applications look at the Language Driver ID (LDID) in the header of a dBASE file, or an associated *.CPG file, which are both used to define the code page and help determine the code page of the file that is read. Based on the code page information it retrieves, ArcGIS Desktop displays the strings accordingly by performing a code page conversion if it is necessary. If a dBASE file lacks an LDID or a .CPG file, it assumes the file is encoded in the Windows (ANSI/Multi-byte) code page.

If the Desktop programs read a dBASE file encoded in OEM but the file does not contain any code page information or does not have an LDID or a .CPG file, the characters do not display correctly. This is because the Desktop programs assume the file is encoded in the ANSI code page since it cannot find the code page information, while the file is actually being encoded in OEM. This means ArcGIS treats the OEM file as being encoded in ANSI, which causes an incorrect display of 8-bit characters stored in the file.

Most shapefiles and dBASE files should have the code page information stored in the file. Some programs, such as Microsoft Access and Excel, encode dBASE files in OEM but do not include the code page information in the LDID, so ArcGIS does not read the files correctly. To avoid this problem, users can set the dbfDefault to the appropriate code page before opening a file that lacks the code page information.

How does the dbfDefault work?

The 'dbfDefault' setting in the system registry defines the code page to which a shapefile and dBASE file export. The code page of a shapefile and dBASE file that are created in ArcGIS Desktop are encoded to the code page defined by the system registry's 'dbfDefault' value. For example, if 'dbfDefault' is set to OEM, shapefile and dBASE files created in ArcMap, ArcCatalog, and ArcGIS Pro are encoded in OEM. Alternatively, if 'dbfDefault' is set to ANSI, shapefile and dBASE files are encoded in ANSI.

If a shapefile or a dBASE file lacks an LDID or a .cpg file, ArcGIS assumes the file to be encoded is in the code page defined by dbfDefault. For example, if the dbfDefault value is set to OEM and a dBASE file lacks both an LDID and a .cpg file, ArcGIS assumes the file is encoded in OEM, and therefore performs a code page conversion to display the 8-bit characters in ArcMap, ArcCatalog and ArcGIS Pro (since the applications are Windows programs that use the ANSI code page to display strings).

Note:
If users have the dbfDefault value set to a certain code page, all shapefiles and dBASE files exported in ArcGIS are encoded in that code page. All shapefiles and dBASE files that do not have the code page information are assumed to be in that code page as well. Therefore, it is important to set the dbfDefault value back to its default value (no value) when the task completes.

What are the programs that dbfDefault can be used with?
ArcGIS Desktop applications are the only programs affected by the dbfDefault setting.

What are the data formats that are affected by dbfDefault?
Shapefile and dBASE files are the only data formats that can be used by the dbfDefault setting to specify the code page.

Procedure

Instructions provided describe how to set the dbfDefault value in the system registry. Two options are listed below.

Warning:
The instructions below include making changes to essential parts of your operating system. It is recommended that you backup your operating system and files, including the registry, before proceeding. Consult with a qualified computer systems professional, if necessary.

Esri cannot guarantee results from incorrect modifications while following these instructions; therefore, use caution and proceed at your own risk.

Option A

  1. Add two keys called Common and CodePage in the system registry.
    To add a key:
    1. Open the Registry Editor: Click Start > Run, type regedit, and click OK.
    2. In the registry tree (in the left pane of the registry window), go to
      • Computer\HKEY_CURRENT_USER\Software\ESRI, and click the registry key, Desktop 10.x (where x is the current installed version).
      • For ArcGIS Pro, click the registry key ArcGISPro.
    3. Click the Edit menu, click New > Key.
    4. Add a new key called Common: Type the name Common, and press Enter.
    5. Click the new Common key, and add another new key called CodePage.
  2. Add a new string value, dbfDefault, to the CodePage key.
    To add a string value:
    1. Click the new CodePage key.
    2. On the Edit menu, click New >String Value.
    3. Type dbfDefault for the new value, and press Enter.
  3. Enter a code page value.
    1. Select the entry just added; it is important that dbfDefault is selected and not (Default).
    2. On the Edit menu, click Modify.
    3. In Value data, type the new code page value, in this example SJIS, and click OK.
edit-string.png

The new CodePage key for ArcMap appears as follows:

reg-update-arcmap.png

The following are lists of supported code page identifiers (these are not case-sensitive).

  • OEM Code Page Identifiers

437 - United States
708 - Arabic (ASMO 708)
720 - Arabic (Transparent ASMO), Arabic (DOS)
737 - Greek, Greek (DOS)
775 - Baltic, Baltic (DOS)
850 - Multi-lingual Latin 1, Western European (DOS)
852 - Latin 2, Central European (DOS)
855 - Cyrillic
857 - Turkish, Turkish (DOS)
860 - Portuguese, Portuguese (DOS)
861 - Icelandic, Icelandic (DOS)
862 - Hebrew, Hebrew (DOS)
863 - French Canadian, French Canadian (DOS)
864 - Arabic, Arabic (864)
865 - Nordic, Nordic (DOS)
866 - Russian, Cyrillic (DOS)
869 - Modern Greek, Modern Greek (DOS)
932 - Japanese, Japanese (Shift-JIS)
936 - Chinese (simplified): People's Republic of China, Singapore
949 - Korean (Unified Hangul Code)
950 - Traditional Chinese: Taiwan, Hong Kong, People's Republic of China
ALARABI - Sets the code page to 448

  • ANSI Code Page Identifiers

1250 - Central European
1251 - Cyrillic
1252 - Western European
1253 - Greek
1254 - Turkish
1255 - Hebrew
1256 - Arabic
1257 - Baltic languages
1258 - Vietnamese
Big5 - Chinese: Taiwan, Hong Kong, Macau
SJIS - Japanese (Sets the code page to 932)

  • ISO Code Page Identifiers

88591 - Latin 1: Western European
88592 - Latin 2: Central and Eastern European
88593 - Latin 3: Southern European
88594 - Latin 4: Northern European
88595 - Cyrillic
88596 - Arabic
88597 - Greek
88598 - Hebrew
88599 - Latin 5: Turkish
885910 - Latin 6: Nordic
885911 - Thai
885913 - Lithuanian
885915 - Latin 9: Western European (Upgraded from Latin 1)

  • Unicode Values

UTF-8 - Sets the code page to 65001
UTF8 - Sets the code page to 65001

Note:
Shapefiles can now be stored in UTF-8. However, shapefiles encoded in UTF-8 are only recognized in ArcMap, ArcCatalog and ArcGIS Pro.

Option B
Alternatively, use a batch file to modify the Windows registry.

  1. In Notepad, create the file ChangeCodePage.bat, using the following code:
For ArcMap
@ECHO OFF
IF "%1"=="" GOTO :EOF 
reg add HKEY_CURRENT_USER\Software\ESRI\Desktop10.8\Common\CodePage /v dbfDefault /t REG_SZ /d %1 /f
Note:
For ArcMap, change the path as needed to match the version of ArcMap on the system to be modified, for example, ..\Desktop10.7).

For ArcGIS Pro

@ECHO OFF
IF "%1"=="" GOTO :EOF 
reg add HKEY_CURRENT_USER\Software\ESRI\ArcGISPro\Common\CodePage /v dbfDefault /t REG_SZ /d %1 /f
  1. Save the file to a location on the machine to be modified.
  2. Open a command prompt window (it may be necessary Run as Administrator to execute the batch file).
  3. To execute the batch file (and change the code page to Japanese in this example), navigate to the location of the batch file and run the following command:
ChangeCodePage SJIS

cmd-bat-small.png

The registry keys are now created for ArcGIS Pro and the code page set to SJIS.

Updated Registry with new code page setting for ArcGIS Pro
 

Last Published: 3/13/2021

Article ID: 000013192

Software: ArcMap 10.8.1, 10.8, 10.7.1, 10.7, 10.6, 10.5.1, 10.5, 10.4.1, 10.4, 10.3.1, 10.3, 10.2.2 ArcGIS Pro 2.4.3, 2.4.2, 2.4.1, 2.4, 2.3.3, 2.3.2, 2.3.1, 2.3, 2.2.4, 2.2.3, 2.2.2, 2.2.1, 2.2, 2.1.3, 2.1.2, 2.1.1, 2.1, 2.0.1, 2.0, 1.4.1, 1.4, 1.3.1, 1.3, 1.2