Customer Service | Training | Contact Us
Welcome!
Login
Search Options   products areas display

Which products should be considered?

All Products

ArcCAD
ArcEditor
ArcExplorer
ArcGIS Engine
ArcGIS Explorer
ArcGIS Image Server
ArcGIS Mobile
ArcGIS Server
ArcIMS
ArcInfo Desktop
ArcInfo Workstation
ArcLogistics Route
ArcPad
ArcPad Application Builder
ArcReader
ArcSDE
ArcView
ArcView 3.x
ArcWeb Services APIs
ArcWeb Toolbar for ArcGIS
Atlas GIS
BusinessMap
BusinessMap Pro
GIS Portal Toolkit
Job Tracking for ArcGIS
MapIt
Maplex
MapObjects -- Java
MapObjects -- Windows
MapObjects IMS
MapObjects LT
MapStudio
Military Overlay Editor
NetEngine
PC ARC/INFO & DAK
PLTS
RouteMap
RouteMap IMS
SDE
Tracking Server

    Remember these settings for each visit More info
You are here:

Technical Article   HowTo:  Convert HTML metadata to a well-formatted text file

Article ID: 23071
Software:  ArcGIS - ArcEditor 8.1, 8.1.2, 8.2, 8.3, 9.0, 9.1, 9.2, 9.3 ArcGIS - ArcInfo 8.0.1, 8.0.2, 8.1, 8.1.2, 8.2, 8.3, 9.0, 9.1, 9.2, 9.3 ArcGIS - ArcView 8.1, 8.1.2, 8.2, 8.3, 9.0, 9.1, 9.2, 9.3
Platforms:  Windows NT 4.0, 2000, XP

Summary

If you have an HTML file that presents Federal Geographic Data Committee (FGDC) metadata in outline format, the HTML file can be converted to a well-formatted text file that can be imported using ArcCatalog. This process requires using Microsoft Word and the U.S. Geological Survey (USGS) metadata utility "cns".

Procedure

  1. Make sure your HTML metadata file presents FGDC data in outline format; otherwise, you will not be able to import that information with ArcCatalog. -show me-

    The outline style HTML format pictured here is the most common format in which FGDC-compliant metadata is presented. When you use the Internet to search the NSDI Clearinghouse, metadata is usually shown in this format.
    [O] Outline-style HTML format for FGDC metadata
  2. Convert the HTML file to a text file.

    a. Open the HTML metadata file in Microsoft Word.

    b. In the File menu, click Save As.

    c. Click the Save as type dropdown list and click Text Only (*.txt) or Text Only with line breaks (*.txt). One retains line breaks within paragraphs, while the other does not.

    d. Specify the name and location of the output file and click Save.

    e. A warning appears indicating that some formatting will be lost. Click Yes to save the document in text format.

    f. Close Microsoft Word.
  3. Remove information from the text file that originated in the header and footer of the HTML page to prevent these nonmetadata artifacts from appearing when the metadata is imported by ArcCatalog.

    a. Open the text file in a text editor such as Notepad or WordPad.

    b. Remove all lines at the beginning of the text file before Identification_Information. -show me-

    [O-Image] Remove header lines after converting FGDC HTML metadata to text


    c. Scroll to the bottom of the file.

    d. Remove the line that says "Generated by mp..." -show me-

    [O-Image] Remove footer lines after converting FGDC HTML metadata to text


    e. Save the changes.
  4. Generate a well-formatted text file using the USGS metadata utility cns (which stands for chew and spit).

    A well-formatted text file is one in which metadata elements are indented hierarchically so that it is clear which elements are contained by other elements. Without this hierarchical structure a metadata XML file can't be created.

    a. Install cns if you don't already have this tool. -show me-
    Summary
    Free metadata utilities are available from USGS, including mp (metadata parser) and cns (chew and spit). Use mp to validate FGDC metadata and to convert FGDC metadata files from one supported format to another. Use cns to turn poorly-formatted metadata text files into well-formatted text files that can be read properly by mp.

    This article describes how to install these utilities. Detailed descriptions about these tools and instructions for using them can be found at http://geology.usgs.gov/tools/metadata.
    Procedure
    1. Using a web browser, go to the Internet address http://geology.usgs.gov/tools/metadata.
    2. In the green box, click the Small package link under the heading "Download software packages".
    3. In the dialog box that appears, click Save to download the appropriate files to your computer.
    4. Navigate to the location where the bin_win.exe file should be placed, e.g., "C:\USGS", then click Save. Close the web browser.
    5. Double-click bin_win.exe on your computer in the location where this file was placed.
    6. In the dialog box that appears, click OK to start the install process.
    7. Type the location where the files should be installed on your computer, e.g., "C:\USGS", then click Unzip.
    8. A message will appear indicating if all files were successfully written to your computer. Click OK.
    9. A readme file is opened in a text editor with information about the utilities. Close the file when you are ready.


    b. If you've just installed cns, set up your computer so it can be run from the command prompt. -show me-
    Summary
    Free metadata utilities are available from USGS, including mp (metadata parser) and cns (chew and spit). Use mp to validate FGDC metadata and to convert FGDC metadata files from one supported format to another. Use cns to turn poorly-formatted metadata text files into well-formatted text files that can be read properly by mp.

    This article describes how to configure your computer so that you can run these utilties from the command prompt. Descriptions of these tools and instructions for using them can be found at http://geology.usgs.gov/tools/metadata.
    Procedure
    The utilities mp and cns must be run from the command prompt. For this to work, your computer's Path environment variable must include the path where these utilties reside on your computer, e.g., "C:\USGS\tools\bin".

    1. Locate your computer's list of environment variables.
      -show me-
    2. Click the System variable Path.
    3. Edit the variable's value by going to the end of the current value, then type a semi-colon followed by the path to where the utilties reside on your computer. For example, type ";C:\USGS\tools\bin". There must not be a space between the semi-colon and the path.
    4. Click Set, Apply, and/or OK as appropriate to save your changes and close the dialog box.
    5. Open the Command Prompt dialog box by clicking Start > Programs > Accessories > Command Prompt.
    6. At the command prompt, type "mp" or "cns" then press Enter.

      If the Path variable was set properly, the usage for the command will appear. If not, go back to step 1 and fix the Path variable.


    c. Open the Command Prompt dialog box by clicking Start > Programs > Accessories > Command Prompt.

    d. Navigate to the directory where the metadata text file is located.

    e. At the command prompt type "cns -o <outputFile> <inputFile>", where <outputFile> is the name of the well-formatted file that will be created by cns, and <inputFile> is the name of the file that you modified in step 2.

    f. Press Enter.
  5. Open the file generated by cns in a text editor and open the original HTML file in a Web browser. Compare the two files. Aside from formatting and the header and footer information, the text file should look exactly like the original HTML. See if you can spot and fix any errors. Save your changes.

    Elements that are incorrectly indented will be imported incorrectly or may not be imported at all. You will not be able to view or edit any content in ArcCatalog that was imported incorrectly.
  6. Import the well-formatted text file using the FGDC CSDGM (TXT) importer.

Related Information

  • Can metadata be imported in HTML format?
    No. There is no way to import metadata that exists in HTML format. HTML files are intended for a human audience. It might be easier to understand why if you think of the results that you see in a browser as a picture. If you look at a metadata...

Created: 8/12/2002
Last Modified: 8/26/2008

This website's graphical display is now viewable only with W3C standards-compliant browsers, but the content is accessible to all browsers and Internet devices. View our supported browser matrix for more information on our website display.