English

How To: Tune the multi-level grid spatial index

Summary

A spatial index is used to perform fast geographic searches for features in a feature class. ArcSDE uses a multi-level grid spatial index for feature classes in several geometry storage types, including compressed binary (LOB, LONG RAW or BINARY) OGC-WKB, DB2 Spatial Extender and the Spatial Type for Oracle. Tuning the spatial index grid size may improve performance of spatial queries. This article provides some background about the multi-level grid spatial index, and also provides tips about tuning it.

The multi-level grid spatial index defines an imaginary X/Y grid. There may be one, two, or three imaginary grids, also known as grid levels, defined per feature class. Most feature classes need only one grid level, but more levels may be needed if the average sizes of the feature envelopes vary greatly. Each feature is indexed using only one of the grid levels: small features in the first level and larger features in second or third level, if present. ArcSDE places an entry or a row in the spatial index for every instance where a single feature intersects a single cell in the specific grid level used for that feature.

During the primary filter operation of a spatial query, ArcSDE finds the X/Y envelope of the spatial filter shape and determines which spatial index grid cell intersects that envelope. Next, ArcSDE performs a query to return all features whose envelopes also intersect those grid cells. The results of this primary filter operation are the candidate features. Later, secondary filtering reduces the result set to only the candidate features that satisfy the exact conditions of the spatial query, such as 'intersects', 'crosses' or 'within'.

Tuning the spatial index means balancing selectivity of the primary filter operation versus reducing the number of entries in the spatial index. The per feature cost of the primary filter is much lower than the secondary filter, because the secondary filter performs detailed computations while the primary filter is a simple query on the spatial index table. The outcome of specifying smaller grid cell size is usually more entries in the spatial index table and finer selectivity from the primary filter operation. This means that the secondary spatial filter must examine fewer features. However, more spatial index entries also increase the size of the spatial index, thus slowing the primary filter operation and consuming more space in the database.

Fortunately, ArcSDE provides statistics about the spatial index which, along with performance testing, can ease the tuning process. The command 'sdelayer -o si_stats' is the primary tool for reporting spatial index grid statistics used to tune the spatial index. Here is an example of the output of this command:

sdelayer -o si_stats -l california_streets,shape -u gisdba -p gisdba

ArcSDE 8.2 Build 161 Thu Jun 6 11:23:12 PDT 2002
Layer Administration Utility
----------------------------------------------------
Layer 5 Spatial Index Statistics:
Level 1, Grid Size 0.1
|-------------------------------------------------------------------|
| Grid Records: 212781 |
| Feature Records: 199991 |
| Grids/Feature Ratio: 1.06 |
| Avg. Features per Grid: 196.47 |
| Max. Features per Grid: 3887 |
| % of Features Wholly Inside 1 Grid: 93.96 |
|-------------------------------------------------------------------|
| Spatial Index Record Count By Group |
| Grids: <=4 >4 >10 >25 >50 >100 >250 >500 |
|---------- ------ ------ ------ ------ ------ ------ ------ ------ |
| Features: 199991 0 0 0 0 0 0 0 |
| % Total: 100% 0% 0% 0% 0% 0% 0% 0%|
|-------------------------------------------------------------------|
Note:
IBM DB2 Spatial Extender, one of the several storage options ArcSDE supports, uses a multi-level grid spatial index. IBM provides a tool called the Index Advisor to help tune this index. You can read more about this tool in this article on IBM's developerWorks site.
Note:
Informix Spatial DataBlade and Oracle Locator/Oracle Spatial use an R-Tree spatial index, not a multi-level grid spatial index. 

Procedure

When performance tuning, the only way to tell if a positive change has been made is to monitor the results of each change. Making one change at a time will highlight the effectiveness of each change. The general steps to follow when tuning are:
1. Establish a repeatable test to measure spatial query performance. This may be a manual test, or an automated test can be created that performs and times a defined set of spatial queries.
2. Use 'sdelayer –o si_stats' to gather beginning spatial index statistics.
3. Change the spatial index grid settings. Use ArcCatalog, or the command 'sdelayer'. To use sdelayer, do the following:

  1. Use 'sdelayer -o load_only_io' to drop the spatial index. No spatial queries or data loading is allowed on this layer while it is in load-only mode.
  2. Use 'sdelayer -o alter -g n,n,n' to specify new grid sizes. Specify zero for the second or third grid size if not used.
  3. Use 'sdelayer -o normal_io' to rebuild the spatial index and make the layer accessible again.
4. Run the query performance test again, and check the spatial index statistics to see if the changes had the desired effect. If not, choose to undo the changes, especially if they have a negative effect on performance.
5. Repeat steps 3 and 4 until there are no reasonable changes to make, or the remaining changes have a negligible effect on performance.
Here are some tips for tuning the multi-level grid spatial index for an ArcSDE feature class:
  • The default grid size setting such as those computed by the command 'shp2sde –o create' are designed to ensure that the data can be loaded. In most circumstances, the default grid sizes will be appropriate for fast spatial queries. However, depending on the characteristics of the data, they may not be of the optimal size. Tuning the grid sizes might result in better spatial query performance. If the default grid sizes are being used, use the command 'sdelayer –o si_stats' to examine the statistics and adjust the grid sizes.
  • A good starting place is to set the grid size to three times the length of the edge of an average-sized feature.
  • Where possible, use only one spatial index grid level. Because it needs to search each level used in each feature class, ArcSDE usually performs best when a single spatial index grid level is used. However, feature classes with highly variable feature envelope sizes may benefit from multi-level spatial indexes. One may choose to experiment with a multi-level spatial index to see if it will improve the spatial index statistics and the query performance.
  • If multiple grid levels are specified, specify them in ascending order by size. The size of each grid level must be at least three times larger than the size of the next smallest grid level.
  • Where possible, specify a grid size where a high percentage of features fall wholly within one grid cell. If the percentage falls below 80%, consider modifying the spatial index settings.
  • The statistic 'Grid/Feature Ratio' shows the ratio of the number of entries in the spatial index table versus the number of features in the feature class. Fewer entries in the spatial index table equate to faster queries. Optimally, the 'Grid/Feature Ratio' should be less than two. If it exceeds four, consider modifying the spatial index settings.
  • At the end of the output of the command 'sdelayer –o si_stats' is a histogram showing how many spatial index records or entries exist for each feature. The majority of the features should have fewer than four records. Even in a well-tuned spatial index, if there are a few features that are significantly larger than the others are, these features will have more spatial index entries. If many features have more than four spatial index entries, consider modifying the spatial index.
  • Keep the average number of features per grid fairly low: between 100 and 300. Try to keep the maximum number of features per grid below 4000. A spatial query that happens to include a grid cell with many associated features will return all those features to ArcSDE to be processed by the secondary spatial filter.
  • Single points can only exist in a single spatial index grid cell. Therefore, there should be exactly one row in the spatial index table for each non-empty point feature, regardless of the size of the grid cells. This fact removes both the need for multiple grid levels and the problem of smaller grid cells producing more rows in the spatial index. With that second problem removed, it is better to have small grid cell sizes for points in order to achieve the best selectivity. Fewer points will be returned to the client by the primary spatial filter query, reducing data transfer size and reducing the number of secondary spatial filter operations that the client must perform.

Related Information