When exporting a geodatabase (GDB) feature class to a shapefile, attributes of text fields in the Thai language get shortened.
Last Published: October 16, 2015ArcGIS for Desktop
Bug ID Number
October 14, 2015
January 9, 2021
ArcGIS for Desktop
Operating System Version
After review by the development team, it has been determined that this issue is related to a known limitation with the software that lies outside of Esri's control. The issue's Additional Information section may contain further explanation.
Prior to 10.2.1 a shapefile created in ArcGIS Desktop was created with a flag in the dBase file header indicating that the code page of the creating machine should be used. This meant that a shapefile created on a Japanese machine, which contains Japanese characters, would display as expected on a Japanese locale machine, but would not on a different locale machine. So shapefiles where not language transportable between machines.
At 10.2.1 we changed the default to UTF-8 and started including a .cpg fine which maintains the code page that the shapefile was created with. This makes the shapefiles independent of machine locale. A Japanese shapefile will display the correct characters on any machine, regardless of its locale. The problem with this approach is that the text limitations of shapefiles result in truncation of the field names and of strings, resulting in data loss. In dBase files, string length, usually reported in the number of characters, is actually the number of bytes. Given that characters in some languages (Japanese, Thai, Chinese, Korean, Greek) required multiple bytes, truncation may occur.
A pre-10.2.1 shapefile cannot be transported between different language machines. Strings in a shapefile created on a Japanese machine will not look like Japanese on a different non-Japanese machine. If the code page is set in the registry () to Japanese (SJIS)
Thai characters require 3 bytes per character. Shapefile text field (and field name) widths are in bytes, not characters. 38 Thai characters requires a field width of 114 bytes. The 26 characters left after export represents 26x3 (78) bytes. The 2 remaining bytes will not hold a character.
See KB21106 HowTo: Read and write shapefile and dBASE files encoded in various code pages
1. Create a new text field with 254 characters in the feature class attribute table.2. Using the Field Calculator, populate the field with the attributes of the original field (the one formatted as 80 characters).3. Export the feature class to the shapefile.