laptop and a wrench

Bug

When exporting a geodatabase (GDB) feature class to a shapefile, attributes of text fields in the Thai language get shortened.

Last Published: October 16, 2015 ArcGIS for Desktop
Bug ID Number BUG-000091403
SubmittedOctober 14, 2015
Last ModifiedJanuary 9, 2021
Applies toArcGIS for Desktop
Version found10.2.1
Operating SystemWindows
Operating System Version7
StatusKnown Limit

Additional Information

Prior to 10.2.1 a shapefile created in ArcGIS Desktop was created with a flag in the dBase file header indicating that the code page of the creating machine should be used. This meant that a shapefile created on a Japanese machine, which contains Japanese characters, would display as expected on a Japanese locale machine, but would not on a different locale machine. So shapefiles where not language transportable between machines. At 10.2.1 we changed the default to UTF-8 and started including a .cpg fine which maintains the code page that the shapefile was created with. This makes the shapefiles independent of machine locale. A Japanese shapefile will display the correct characters on any machine, regardless of its locale. The problem with this approach is that the text limitations of shapefiles result in truncation of the field names and of strings, resulting in data loss. In dBase files, string length, usually reported in the number of characters, is actually the number of bytes. Given that characters in some languages (Japanese, Thai, Chinese, Korean, Greek) required multiple bytes, truncation may occur. A pre-10.2.1 shapefile cannot be transported between different language machines. Strings in a shapefile created on a Japanese machine will not look like Japanese on a different non-Japanese machine. If the code page is set in the registry () to Japanese (SJIS) Thai characters require 3 bytes per character. Shapefile text field (and field name) widths are in bytes, not characters. 38 Thai characters requires a field width of 114 bytes. The 26 characters left after export represents 26x3 (78) bytes. The 2 remaining bytes will not hold a character. See KB21106 HowTo: Read and write shapefile and dBASE files encoded in various code pages

Workaround

1. Create a new text field with 254 characters in the feature class attribute table.2. Using the Field Calculator, populate the field with the attributes of the original field (the one formatted as 80 characters).3. Export the feature class to the shapefile.

Steps to Reproduce

Bug ID: BUG-000091403

Software:

  • ArcGIS for Desktop

Get help from ArcGIS experts

Contact technical support

Download the Esri Support App

Go to download options

Discover more on this topic