English

How To: Convert HTML metadata to a well-formatted text file

Summary

If you have an HTML file that presents Federal Geographic Data Committee (FGDC) metadata in outline format, the HTML file can be converted to a well-formatted text file that can be imported using ArcCatalog. This process requires using Microsoft Word and the U.S. Geological Survey (USGS) metadata utility "cns".

Procedure

  1. Make sure your HTML metadata file presents FGDC data in outline format; otherwise, you will not be able to import that information with ArcCatalog.
    The outline style HTML format pictured here is the most common format in which FGDC-compliant metadata is presented. When you use the Internet to search the NSDI Clearinghouse, metadata is usually shown in this format.

    [O] Outline-style HTML format for FGDC metadata
  2. Convert the HTML file to a text file.

    a. Open the HTML metadata file in Microsoft Word.

    b. In the File menu, click Save As.

    c. Click the Save as type dropdown list and click Text Only (*.txt) or Text Only with line breaks (*.txt). One retains line breaks within paragraphs, while the other does not.

    d. Specify the name and location of the output file and click Save.

    e. A warning appears indicating that some formatting will be lost. Click Yes to save the document in text format.

    f. Close Microsoft Word.
  3. Remove information from the text file that originated in the header and footer of the HTML page to prevent these nonmetadata artifacts from appearing when the metadata is imported by ArcCatalog.

    a. Open the text file in a text editor such as Notepad or WordPad.

    b. Remove all lines at the beginning of the text file before Identification_Information.
    [O-Image] Remove header lines after converting FGDC HTML metadata to text
    c. Scroll to the bottom of the file.

    d. Remove the line that says "Generated by mp..."
    [O-Image] Remove footer lines after converting FGDC HTML metadata to text
    e. Save the changes.
  4. Generate a well-formatted text file using the USGS metadata utility cns (which stands for chew and spit).

    A well-formatted text file is one in which metadata elements are indented hierarchically so that it is clear which elements are contained by other elements. Without this hierarchical structure a metadata XML file can't be created.

    a. Install cns if you don't already have this tool.
    <a href='http://support.esri.com/en/knowledgebase/techarticles/detail/23096' target='_blank'>How To: Install the USGS FGDC metadata utilties</a>

    b. If you've just installed cns, set up your computer so it can be run from the command prompt.
    <a href='http://support.esri.com/en/knowledgebase/techarticles/detail/23103' target='_blank'>How To: Set up the USGS FGDC metadata utilties</a>

    c. Open the Command Prompt dialog box by clicking Start > Programs > Accessories > Command Prompt.

    d. Navigate to the directory where the metadata text file is located.

    e. At the command prompt type "cns -o <outputFile> <inputFile>", where <outputFile> is the name of the well-formatted file that will be created by cns, and <inputFile> is the name of the file that you modified in step 2.

    f. Press Enter.
  5. Open the file generated by cns in a text editor and open the original HTML file in a Web browser. Compare the two files. Aside from formatting and the header and footer information, the text file should look exactly like the original HTML. See if you can spot and fix any errors. Save your changes.

    Elements that are incorrectly indented will be imported incorrectly or may not be imported at all. You will not be able to view or edit any content in ArcCatalog that was imported incorrectly.
  6. Import the well-formatted text file using the FGDC CSDGM (TXT) importer.

Related Information