HOW TO
In ArcGIS Pro, we can sort our data based on our fields. There is the option to sort each field in ascending or descending order and we can combine fields. More information about sorting records in a table can be found here: Sort records in a table.
When our fields are simple, such as numeric values in a numeric type field, or letters in a text field, then table sorting is simple. Numbers in a Long field type can easily be sorted in ascending order, and words in a Text field can be sorted in alphabetical order.
Numeric Sorting
Alphabetical Sorting
However, sorting numbers stored in text fields yields unexpected results. Instead of 1, followed by 2 and 3, etc., we instead get 1 followed by 10, followed by 100, as shown in the next image.
To better understand this behavior, we first need to understand Unicode and ASCII values and Python sorting. See the ArcGIS Pro reference: A quick tour of Unicode,
Unicode is “a character encoding system used by computers for the storage and interchange of textual data.” Each code corresponds to a unique character. The Unicode values for the numbers 0 to 9 are U+0030 to U+0039, while the ASCII codes are 48 to 57. It is important to note that there is no Unicode or ASCII value for the number 10, we just combine the codes for 1 and 0.
When we sort our data in ascending order, ArcGIS Pro reads the values from left to right in our chosen sort field for each row. The first character Unicode values are compared and listed in ascending order. This process continues for all the characters in all the rows. The encoded values are all sorted in ascending order, sorting the fields as a result.
When we think about our sorting examples, the number 1 (U+0031) comes first followed by numbers 10 (U+0031 U+0030), 100 (U+0031 U+0030 U+0030) and so forth. Another example of this type of sorting is in a dictionary, all the words starting with A are listed before those that start with B, and aardvark is before abate. In this way, when we sort our values in ascending order, our numbers go from 1 to 10009, 1001 to 10019, 1002 to 10029 and so forth. An obvious issue with this is that 1002 comes after 10019 because the algorithm looks at the characters 100* first and then compares 1 to 2, decides 1 comes before 2 and then compares all the strings with 1001*, leaving 1002 for later in the sequence. This explains why 10 comes before 2 (U+0032), as the sorting has sorted '1*' to come before 2.
Ideally, if we want to sort fields that have a numeric component, we will use a numeric field type such as Long, where the numbers are treated like integers and not Unicode values. If we do want to sort our data based on a text field that contains a mix of alphanumeric characters our best option is to include the string length in our comparison. This means that all the rows with a single character are sorted first, then all the double characters and so forth. In this way, the number 2 will come before 10. To do this in ArcGIS Pro, we need to include a string length field in our table to calculate the length of the numeric components. Ideally, we would have a defining character we can use to separate the values such as forward slash (/).
Using the code block in the calculate field window, we can use a formula like:
import re def calculate_sort_value(s):
if '/' in s:
result = s.split("/", 1)[1]
return len(result)
else:
return None
We can then apply this new length field as our sort. If our text field is more complicated, for example building codes with different sets of characters, i.e., some use a hyphen while others use a slash mark, we can create multiple length fields and combine these as primary and secondary sort fields using the custom sort functionality. First, all our rows will be sorted by length A, then length B.
Article ID: 000031875
Get help from ArcGIS experts
Download the Esri Support App