Knowledge Base - Technical Articles


Technical Article   HowTo:  Read and write shapefile and dBASE files encoded in various code pages

Article ID: 21106
Software:  ArcGIS - ArcEditor 9.2, 9.3, 9.3.1, 10 ArcGIS - ArcInfo 9.2, 9.3, 9.3.1, 10 ArcGIS - ArcView 9.2, 9.3, 9.3.1, 10 ArcGIS Server (10.0 and prior) 10 ArcGIS for Desktop Advanced 10.1, 10.2, 10.2.1, 10.2.2, 10.3 ArcGIS for Desktop Standard 10.1, 10.2, 10.2.1, 10.2.2, 10.3 ArcGIS for Desktop Basic 10.1, 10.2, 10.2.1, 10.2.2, 10.3 ArcGIS for Server 10.1, 10.2, 10.2.1, 10.2.2, 10.3 ArcGIS Pro 1.0
Platforms:  Windows XP, Server 2003, Vista, Server 2008, Windows 7, Windows 8, Server 2012, Server 2008 R2

Summary

Esri has implemented a 'CODE PAGE CONVERSION' functionality in ArcGIS for Desktop (ArcMap, ArcCatalog, and ArcToolbox) that allows the Desktop applications to read and write shapefile and dBASE files encoded in various code pages. The code page conversion functionality for dBASE files (called 'dbfDefault') is activated by specifying a code page value in the system registry. This is very similar to the &CODEPAGE function used in ArcInfo Workstation.

Prior to ArcGIS 10.2.1, the following procedures can be used to set the desired code page behavior. If ArcGIS for Desktop 10.2.1 or 10.2.2 has been installed, download and install the patches described in Knowledge Base article 42646 before following these instructions.


 In the header of each shapefile (.DBF), a reference to a code page is included. Prior to ArcGIS 10.2.1, the code page used corresponded to the user's locale. For example, if the user is on a Japanese locale, the code page used in the .DBF file is 'Shft-JIS'.

At ArcGIS 10.2.1, the default sets the code page to UTF-8 (UNICODE) in the shapefile (.DBF). This is constant with current internationalization practices and should ensure the data is readable.


FAQs

• What does the dbfDefault setting do? -show me-

By setting a code page value in the system registry, users are able to read and write shapefile and dBASE files encoded in that code page. For example, users can export a shapefile encoded in OEM by setting the code page registry value to OEM. Users can also read shapefiles and dBASE files that do not have the code page information stored in the file as long as users know which code page the file is encoded in.


• Why set the dbfDefault? -show me-

When opening a shapefile and dBASE file in ArcGIS Desktop, the Desktop programs look at the Language Driver ID (LDID) in the header of a dBASE file, or an associated *.CPG file, which are both used to define the code page and help determine the code page of the file that is read. Based on the code page information it retrieves, ArcGIS for Desktop displays the strings accordingly by performing a code page conversion if it is necessary. If a dBASE file lacks an LDID or a .CPG file, it assumes the file is encoded in the Windows (ANSI/Multi-byte) code page.

If the Desktop programs read a dBASE file encoded in OEM but the file does not contain any code page information or does not have an LDID or a .CPG file, the characters do not display correctly. This is because the Desktop programs assume the file is encoded in the ANSI code page since it cannot find the code page information, while the file is actually being encoded in OEM. This means ArcGIS treats the OEM file as being encoded in ANSI, which causes an incorrect display of 8-bit characters stored in the file.

Most shapefiles and dBASE files should have the code page information stored in the file. Some programs, such as Microsoft Access 2000 and Excel 2000, encode dBASE files in OEM but do not include the code page information in the LDID, so ArcGIS does not read the files correctly. To avoid this problem, users can set the dbfDefault to the appropriate code page before opening a file that lacks the code page information.


• How does the dbfDefault work? -show me-

The 'dbfDefault' setting in the system registry defines the code page to which a shapefile and dBASE file export. The code page of a shapefile and dBASE file that are created in ArcGIS Desktop are encoded to the code page defined by the system registry's 'dbfDefault' value. For example, if 'dbfDefault' is set to OEM, shapefile and dBASE files created in ArcMap, ArcCatalog, and ArcToolbox are encoded in OEM. Alternatively, if 'dbfDefault' is set to ANSI, shapefile and dBASE files are encoded in ANSI.

It is important to note that there is one exception to this: shapefiles exported from coverages in ArcCatalog and ArcToolbox in languages other than Spanish and Arabic are encoded in OEM, regardless of the dbfDefault setting. This is because 'Coverage to Shapefile' in ArcToolbox uses the functionality of ArcInfo Workstation, which are defined layers that run on DOS, so the output file is always encoded in the OEM code page or the DOS code page. Shapefiles exported from coverages in ArcCatalog and ArcToolbox in Spanish and Arabic are encoded in ANSI.

▪ Shapefiles exported from a coverage in ArcCatalog and ArcToolbox are always in the OEM code page (except for Spanish).

The same logic applies to shapefile and dBASE files that are read into ArcGIS for Desktop; if a shapefile or a dBASE file lacks an LDID or a .cpg file, ArcGIS assumes the file to be encoded is in the code page defined by dbfDefault. For example, if the dbfDefault value is set to OEM and a dBASE file lacks both an LDID and a .cpg file, ArcGIS for Desktop assumes the file is encoded in OEM, and therefore performs a code page conversion to display the 8-bit characters in ArcMap and ArcCatalog (since both of the applications are Windows programs that use the ANSI code page to display strings).

  If users have the dbfDefault value set to a certain code page, all shapefiles and dBASE files exported in ArcGIS are encoded in that code page. All shapefiles and dBASE files that do not have the code page information are assumed to be in that code page as well. Therefore, it is important to set the dbfDefault value back to its default value (no value) when the task completes.



• What are the programs that dbfDefault can be used with? -show me-

ArcGIS for Desktop is the only program that is affected by the dbfDefault setting. Other programs, such as ArcInfo Workstation and ArcView 3.x, or other code page settings such as the '&CODEPAGE' function used in ArcInfo Workstation and the Code Page Profile used in ArcView 3.x, are not affected.

In ArcInfo Workstation,

▪ ARCSHAPE with &CODEPAGE OEM creates a shapefile in OEM
▪ ARCSHAPE with &CODEPAGE ANSI creates a shapefile in ANSI
▪ INFODBASE with &CODEPAGE OEM creates a dBASE file in OEM
▪ INFODBASE with &CODEPAGE ANSI creates a dBASE file in ANSI

In ArcView 3.x,

▪ Shapefile and dBASE files are saved in the ANSI code page.


• What are the data formats that are affected by dbfDefault? -show me-

Shapefile and dBASE files are the only data formats that can be used by the dbfDefault setting to specify the code page. Other data formats, such as coverage and personal geodatabase, are not affected by the dbfDefault setting.

In ArcGIS for Desktop (regardless of the dbfDefault setting),

▪ Personal geodatabases are saved in Unicode
▪ Personal geodatabase tables are saved in Unicode
▪ Coverages are saved in the ISO code page
▪ INFO files are saved in the ISO code page
▪ Interchange files are saved in the ANSI code page
▪ Text files are saved in the ANSI code page

Procedure

Instructions provided describe how to set the dbfDefault value in the system registry. Two options are listed below.


 WARNING: The instructions below include making changes to essential parts of your operating system. It is recommended that you backup your operating system and files, including the registry, before proceeding. Consult with a qualified computer systems professional, if necessary.

Esri cannot guarantee results from incorrect modifications while following these instructions; therefore, use caution and proceed at your own risk.


Option A

  1. Add two keys called 'Common' and 'CodePage' in the system registry.

    To add a key:

    a. Open the Registry Editor: Click Start > Run, type 'regedit', and click OK.

    b. In the registry tree (in the left pane of the registry window), go to 'My Computer\HKEY_CURRENT_USER\Software\ESRI', and click the registry key, 'Desktop 10.x'. For Pro click the registry key 'Pro1.0'. (For version 9.3.1 and earlier versions, go to 'My Computer\HKEY_CURRENT_USER\Software', and click the registry key ESRI.)

    c. Add a new key called 'Common' (on the Edit menu:
    Navigate to New, select Key, type the name "Common", and press ENTER).

    d. Click the registry key just created (Common), and add a new key called 'CodePage'.

  2. Add a new string value, 'dbfDefault', to the CodePage key.

    To add a string value:

    a. Click the key CodePage.

    b. On the Edit menu, navigate to New, and select 'String Value'.

    c. Type 'dbfDefault' for the new value, and press ENTER.

    The new CodePage key should look like this:



    [O-Image] Screenshot of dbfDefault in
  3. Enter a code page value.

    a. Select the entry just added; it is important that dbfDefault is selected and not (Default).

    b. On the Edit menu, click Modify.

    c. In Value data, type the new code page value, and click OK.

    The following are lists of supported code page identifiers (these are not case-sensitive).

    • OEM Code Page Identifiers -show me-

    437 - United States
    708 - Arabic (ASMO 708)
    720 - Arabic (Transparent ASMO), Arabic (DOS)
    737 - Greek, Greek (DOS)
    775 - Baltic, Baltic (DOS)
    850 - Multi-lingual Latin 1, Western European (DOS)
    852 - Latin 2, Central European (DOS)
    855 - Cyrillic
    857 - Turkish, Turkish (DOS)
    860 - Portuguese, Portuguese (DOS)
    861 - Icelandic, Icelandic (DOS)
    862 - Hebrew, Hebrew (DOS)
    863 - French Canadian, French Canadian (DOS)
    864 - Arabic, Arabic (864)
    865 - Nordic, Nordic (DOS)
    866 - Russian, Cyrillic (DOS)
    869 - Modern Greek, Modern Greek (DOS)
    932 - Japanese, Japanese (Shift-JIS)
    936 - Chinese (simplified): People's Republic of China, Singapore
    949 - Korean (Unified Hangul Code)
    950 - Traditional Chinese: Taiwan, Hong Kong, People's Republic of China
    ALARABI - Sets the code page to 448


    • ANSI Code Page Identifiers -show me-

    1250 - Central European
    1251 - Cyrillic
    1252 - Western European
    1253 - Greek
    1254 - Turkish
    1255 - Hebrew
    1256 - Arabic
    1257 - Baltic languages
    1258 - Vietnamese
    Big5 - Chinese: Taiwan, Hong Kong, Macau
    SJIS - Japanese (Sets the code page to 932)


    • ISO Code Page Identifiers -show me-

    88591 - Latin 1: Western European
    88592 - Latin 2: Central and Eastern European
    88593 - Latin 3: Southern European
    88594 - Latin 4: Northern European
    88595 - Cyrillic
    88596 - Arabic
    88597 - Greek
    88598 - Hebrew
    88599 - Latin 5: Turkish
    885910 - Latin 6: Nordic
    885911 - Thai
    885913 - Lithuanian
    885915 - Latin 9: Western European (Upgraded from Latin 1)


    • Unicode Values -show me-

    UTF-8 - Sets the code page to 65001
    UTF8 - Sets the code page to 65001

     Shapefiles can now be stored in UTF-8. However, shapefiles encoded in UTF-8 are only recognized in ArcGIS for Desktop.



    Option B

    Alternatively, use a batch file to modify the Windows registry.

    1. In Notepad, create the file ChangeCodePage.bat, using the following code:

    @ECHO OFF
    
    IF "%1"=="" GOTO :EOF
    reg add HKEY_CURRENT_USER\Software\ESRI\Desktop10.3\Common\CodePage /v dbfDefault /t REG_SZ /d %1 /f


     Change the path to match the version of ArcGIS on the system that is to be modified, for example, ..\Desktop10.1).


    2. Save the file to a location on the machine to be modified.
    3. Open a command prompt window (it may be necessary 'Run as Administrator' to execute the batch file).
    4. To execute the batch file (and change the code page to Japanese in this example), navigate to the location of the batch file and run the following command:

    ChangeCodePage SJIS

    The registry keys are now created and the code page is set to SJIS. -show me-

    [O-Image]

Created: 12/4/2001
Last Modified: 10/14/2014

Article Rating: (7)
If you would like to post a comment, please login

Comments

By DejidmaaDamdindorj - 07/10/2014 2:53 AM

Great article! It helped a lot!

By kapayne@uga.edu - 07/09/2014 7:56 AM

Other - See details below.

Looks like the workaround only applies to ArcGIS Server 10.0 and prior, or 10.2. Can I apply it to our 10.1 Server? thanks,

Rating:

By mvigerske - 11/28/2013 3:15 AM

Great article! It helped a lot!

ah, sorry, wrong evaluation. 5 stars of course...

Rating:

By mvigerske - 11/28/2013 3:14 AM

Great article! It helped a lot!

Okay, that worked for me. Thank you! But I have one question: "However, shapefiles encoded in UTF-8 are only recognized in ArcGIS Desktop." It works in QuantumGIS. But maybe it does not work in some other (older) GIS? Or what does that sentence mean?

Rating:

By JMUMapper - 03/08/2012 6:48 AM

Great article! It helped a lot!

Works great for individual users. Can the same 'Common' key be added to the HKEY_LOCAL_MACHINE\ESRI folder to affect the change for multiple users that use the same machine at different times? (As a universal solution)

Rating:

By pjbcoetzer - 01/25/2012 12:23 AM

Other - See details below.

Is it possible to set the codepage for file GeoDB's?

Rating:

By Anonymous - 01/11/2011 1:51 AM

I would like to see a new article that discusses the topic outlined below.

For ArcGIS 10, step 1b needs to be adjusted. It should read: In the registry tree (on the left), go to 'My Computer\HKEY_CURRENT_USER\Softwa re\ESRI', and click the registry key Desktop10.0. Rest of the procedure is still valid for ArcGIS 10 (please note that the space in 'Softwa re' needs to be removed, but I can't use submit with the word 'w a r'...)

Rating:

By Anonymous - 08/17/2008 6:19 AM

Great article! It helped a lot!

i have done what is shown in this article, but still i am not able to see the attributes in arabic, pls help me

Rating: