Esri has implemented a 'CODE PAGE CONVERSION' functionality in ArcGIS Desktop applications (ArcMap, ArcCatalog, and ArcGIS Pro) that allows the Desktop applications to read and write shapefile and dBASE files encoded in various code pages. The code page conversion functionality for dBASE files (called 'dbfDefault') is activated by specifying a code page value in the system registry. A reference to a code page is included in the header of the DBF file. The default code page in a shapefile (.DBF) is set to UTF-8 (UNICODE). This is the default for current internationalization practices.
By setting a code page value in the system registry, users are able to read and write shapefile and dBASE files encoded in that code page. For example, users can export a shapefile encoded in OEM by setting the code page registry value to OEM. Users can also read shapefiles and dBASE files that do not have the code page information stored in the file as long as users know which code page the file is encoded in.
When opening a shapefile and dBASE file in ArcMap, ArcCatalog, and ArcGIS Pro, the applications look at the Language Driver ID (LDID) in the header of a dBASE file, or an associated *.CPG file, which are both used to define the code page and help determine the code page of the file that is read. Based on the code page information it retrieves, ArcGIS Desktop displays the strings accordingly by performing a code page conversion if it is necessary. If a dBASE file lacks an LDID or a .CPG file, it assumes the file is encoded in the Windows (ANSI/Multi-byte) code page.
If the Desktop programs read a dBASE file encoded in OEM but the file does not contain any code page information or does not have an LDID or a .CPG file, the characters do not display correctly. This is because the Desktop programs assume the file is encoded in the ANSI code page since it cannot find the code page information, while the file is actually being encoded in OEM. This means ArcGIS treats the OEM file as being encoded in ANSI, which causes an incorrect display of 8-bit characters stored in the file.
Most shapefiles and dBASE files should have the code page information stored in the file. Some programs, such as Microsoft Access and Excel, encode dBASE files in OEM but do not include the code page information in the LDID, so ArcGIS does not read the files correctly. To avoid this problem, users can set the dbfDefault to the appropriate code page before opening a file that lacks the code page information.
The 'dbfDefault' setting in the system registry defines the code page to which a shapefile and dBASE file export. The code page of a shapefile and dBASE file that are created in ArcGIS Desktop are encoded to the code page defined by the system registry's 'dbfDefault' value. For example, if 'dbfDefault' is set to OEM, shapefile and dBASE files created in ArcMap, ArcCatalog, and ArcGIS Pro are encoded in OEM. Alternatively, if 'dbfDefault' is set to ANSI, shapefile and dBASE files are encoded in ANSI.
If a shapefile or a dBASE file lacks an LDID or a .cpg file, ArcGIS assumes the file to be encoded is in the code page defined by dbfDefault. For example, if the dbfDefault value is set to OEM and a dBASE file lacks both an LDID and a .cpg file, ArcGIS assumes the file is encoded in OEM, and therefore performs a code page conversion to display the 8-bit characters in ArcMap, ArcCatalog and ArcGIS Pro (since the applications are Windows programs that use the ANSI code page to display strings).
Note: If users have the dbfDefault value set to a certain code page, all shapefiles and dBASE files exported in ArcGIS are encoded in that code page. All shapefiles and dBASE files that do not have the code page information are assumed to be in that code page as well. Therefore, it is important to set the dbfDefault value back to its default value (no value) when the task completes.
ArcGIS Desktop applications are the only programs affected by the dbfDefault setting.
Shapefile and dBASE files are the only data formats that can be used by the dbfDefault setting to specify the code page.
Instructions provided describe how to set the dbfDefault value in the system registry. Two options are listed below.
Warning: The instructions below include making changes to essential parts of your operating system. It is recommended that you backup your operating system and files, including the registry, before proceeding. Consult with a qualified computer systems professional, if necessary. Esri cannot guarantee results from incorrect modifications while following these instructions; therefore, use caution and proceed at your own risk.
The new CodePage key for ArcMap appears as follows:
The following are lists of supported code page identifiers (these are not case-sensitive).
437 - United States
708 - Arabic (ASMO 708)
720 - Arabic (Transparent ASMO), Arabic (DOS)
737 - Greek, Greek (DOS)
775 - Baltic, Baltic (DOS)
850 - Multi-lingual Latin 1, Western European (DOS)
852 - Latin 2, Central European (DOS)
855 - Cyrillic
857 - Turkish, Turkish (DOS)
860 - Portuguese, Portuguese (DOS)
861 - Icelandic, Icelandic (DOS)
862 - Hebrew, Hebrew (DOS)
863 - French Canadian, French Canadian (DOS)
864 - Arabic, Arabic (864)
865 - Nordic, Nordic (DOS)
866 - Russian, Cyrillic (DOS)
869 - Modern Greek, Modern Greek (DOS)
932 - Japanese, Japanese (Shift-JIS)
936 - Chinese (simplified): People's Republic of China, Singapore
949 - Korean (Unified Hangul Code)
950 - Traditional Chinese: Taiwan, Hong Kong, People's Republic of China
ALARABI - Sets the code page to 448
1250 - Central European
1251 - Cyrillic
1252 - Western European
1253 - Greek
1254 - Turkish
1255 - Hebrew
1256 - Arabic
1257 - Baltic languages
1258 - Vietnamese
Big5 - Chinese: Taiwan, Hong Kong, Macau
SJIS - Japanese (Sets the code page to 932)
88591 - Latin 1: Western European
88592 - Latin 2: Central and Eastern European
88593 - Latin 3: Southern European
88594 - Latin 4: Northern European
88595 - Cyrillic
88596 - Arabic
88597 - Greek
88598 - Hebrew
88599 - Latin 5: Turkish
885910 - Latin 6: Nordic
885911 - Thai
885913 - Lithuanian
885915 - Latin 9: Western European (Upgraded from Latin 1)
UTF-8 - Sets the code page to 65001
UTF8 - Sets the code page to 65001
Note: Shapefiles can now be stored in UTF-8. However, shapefiles encoded in UTF-8 are only recognized in ArcMap, ArcCatalog and ArcGIS Pro.
Alternatively, use a batch file to modify the Windows registry.
@ECHO OFF IF "%1"=="" GOTO :EOF reg add HKEY_CURRENT_USER\Software\ESRI\Desktop10.8\Common\CodePage /v dbfDefault /t REG_SZ /d %1 /f
Note: For ArcMap, change the path as needed to match the version of ArcMap on the system to be modified, for example, ..\Desktop10.7).
For ArcGIS Pro
@ECHO OFF IF "%1"=="" GOTO :EOF reg add HKEY_CURRENT_USER\Software\ESRI\ArcGISPro\Common\CodePage /v dbfDefault /t REG_SZ /d %1 /f
The registry keys are now created for ArcGIS Pro and the code page set to SJIS.