| Bug ID Number |
BUG-000155378 |
| Submitted | January 26, 2023 |
| Last Modified | June 20, 2025 |
| Applies to | ArcGIS Pro |
| Version found | N/A |
| Operating System | Windows OS |
| Operating System Version | 10.0 64 Bit |
| Status | Known Limit
After review by the development team, it has been determined that this issue is related to a known limitation with the software that lies outside of Esri's control. The issue's Additional Information section may contain further explanation.
|
Additional Information
Detailed information about how LocateXT works is provided in the Pro help system in the topic "Adjust how locations and attributes are extracted" in the section 'Scan files' (https://pro.arcgis.com/en/pro-app/latest/help/data/locatext/adjust-how-locations-and-attributes-are-extracted.htm#ESRI_SECTION1_5B0CE46F1BE4444FA60E37132F3D4BAC) under the subheading 'Some files are not processed' .
LocateXT does not extract text from the PDF. Text is extracted from PDFs using IFilter plug-ins to the Windows operating system. IFilters are primarily created to facilitate Windows Search. LocateXT asks the IFilters available on the local computer to extract text from the PDF file. LocateXT then processes the text returned by the IFilter to find x, y coordinates, place names, etc., and generate spatial features and attributes for those features accordingly. If the IFilters on the local machine cannot extract text from the PDF, LocateXT has nothing to process.
A PDF file is not a text file. A PDF file contains objects and data. In the PDF, text can be encoded as binary data and compressed inside the file. Every PDF file is not structured in the same manner. It is true that with the PDF files provided by the customer, both files display text that defines an x, y coordinate. However, you can open these PDFs in some text editors and see that the two files are structured in very different ways.
It makes sense that the standard IFilter provided by Microsoft that comes with the Windows operating system would be able to read PDF files generated by Microsoft Office products. The different internal PDF structure used in the files generated by ArcGIS Pro is obviously not as readable to that specific IFilter for some reason.
Other IFilters are available that can extract more text out of different PDF files. Adobe provides IFilters that can extract text from PDFs. One Adobe IFilter that was tested would not read text from the PDF file provided by the customer that was generated by ArcGIS Pro, but other newer IFilters may be available with more advanced Adobe products. We found that if you use the TET PDF IFilter, for example, the coordinates are successfully read out of the PDF file provided by the customer that was generated by ArcGIS Pro.
Workaround
a) Convert the PDF file to another text-based document format that can be parsed more effectively by the IFilter provided with the operating system.
b) Open the PDF file in a PDF reader. Copy the text and paste it into the Text tab on the Extract Locations pane.
Steps to Reproduce