Summary
Instructions provided describe how to take a complex SQL statement containing multiple AND conditions and restructure the SQL statement leveraging multiple subqueries and additional SQL operators to improve performance.
The objective of the article is to demonstrate different methodologies for answering questions involving spatial relationships verses using simple SQL constructs.
Procedure
The example and steps below demonstrate an inefficient (slow performing) SQL statement and efficient SQL statement in response to: "Identify all parcels that have an area greater than 100,000 square feet and less than 500,000 square feet which are within a 1/2 mile (2640 feet) from a major highway and in an area zoned industrial".
The following data is available in response to the above statement: 558,605 parcels (polygons) for a metropolitan city, 2,606 major_roads (lines) and 18,653 zoning (polygons).
- The logical SQL construct for answering the question simply relies on a set of predicate filters using the logical AND condition with two additional intersect spatial relational operators.
Constructing the query requires selecting the count of parcels (COUNT) that meet the criteria and specifying the three tables, parcels (aliased as a), major_roads (alias b), and zoning (alias c), and including the WHERE clause setting each predicate filter using the logical AND condition.
The first predicate filter is parcels shape.area greater than 100,000 and less than 500,000 feet. The second and third predicate filters are where the zoning table's zonegen_cl attribute equals 'IND' (industrial) and major_roads table's type attribute equals 1110 (highways). The remaining two filters are the spatial relationships, parcels that intersect zoned industrial areas, and parcels that intersect each buffered major_road that is a highway (buffered by 2640 feet).
When the query is executed, the result returns a count of 604 parcels. But the time to execute the query requires over 18 minutes (which may be considered extremely slow).
Below is the SQL statement's syntax, the elapsed time to execute the statement, and the optimizer's execution plan. The execution plan shows how the Oracle database accessed the data step by step to derive the result.
Code:
SQL> SELECT COUNT(a.objectid)
2 FROM parcels a, major_roads b, zoning c
3 WHERE sde.st_area(a.shape) > 100000
4 AND sde.st_area(a.shape) < 500000
5 AND c.zonegen_cl = 'IND'
6 AND b.type = 1110
7 AND sde.st_intersects(a.shape,c.shape) = 1
8 AND sde.st_intersects(a.shape,sde.st_buffer(b.shape,2640)) = 1;
COUNT(OBJECTID)
---------------
604
Elapsed: 00:18:31.57
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1324 | 3812 (2)| 00:00:46 |
| 1 | SORT AGGREGATE | | 1 | 1324 | | |
| 2 | NESTED LOOPS | | 203 | 262K| 3812 (2)| 00:00:46 |
| 3 | NESTED LOOPS | | 10 | 7280 | 757 (1)| 00:00:10 |
|* 4 | TABLE ACCESS FULL | MAJOR_ROADS | 163 | 53790 | 22 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID | PARCELS | 1 | 398 | 757 (1)| 00:00:10 |
|* 6 | DOMAIN INDEX (Sel: .000179)| A15_IX1 | | | 4 (0)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | ZONING | 21 | 12516 | 306 (2)| 00:00:04 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("B"."TYPE"=1110)
5 - filter"SDE"."ST_AREA"("A"."SHAPE")>100000 AND "SDE"."ST_AREA"("A"."SHAPE")<500000)
6 - access("SDE"."ST_INTERSECTS"("A"."SHAPE","SDE"."ST_BUFFER"("B"."SHAPE",2640))=1)
7 - filter("C"."ZONEGEN_CL"=U'IND' AND "SDE"."ST_INTERSECTS"("A"."SHAPE","C"."SHAPE")=1)
The execution plan shows how the optimizer chose to access the data. As is shown, the database performed a full table scan against the major_roads table searching for all rows where the attribute type equalled the value 1110. For each major_roads row, the database passed the row's shape attribute (the st_geometry attribute) to the st_buffer operator. The st_buffer operator buffered the input geometry by the specified distance to generate a new st_geometry shape which was then passed to the outer st_intersects operator. The st_intersects operator used the input buffered road shape to perform the intersection against the parcels table. Next, for each intersecting parcel, the parcel's shape was compared as a filter with every zoning row that the attribute zonegen_cl equalled IND.
The majority of the 18 minutes of elapsed time can be attributed to the cost of comparing every parcel that intersected the buffered major_roads with every zoning polygon. - To improve a query's performance when asking complex questions, rewriting the original statement by leveraging additional constructs in Oracle, such as intersect, and new operators in ArcGIS 9.3, such as st_aggr_union is sometimes required.
The example below demonstrates how to rewrite the SQL statement from step 1 to improve performance and leverage additional SQL constructs.
Because the statement referenced at the beginning of the Procedure section contains two spatial filters, it is easier to perform each spatial filter as separate SQL statements and use the Oracle intersect operator to return one result set.
Lines 2 through 5 execute the first spatial filter, which returns all parcels that intersect zoning polygons that have the attribute zonegen_cl equal IND.
The second spatial filter to return all parcels, which are within a half-mile from major roads, is executed on lines 7 through 12. To further optimize this step, the statement utilizes the new st_aggr_union aggregate released with ArcGIS 9.3.
Lines 9 through 11, the query aggregates all major_roads where type equals 1110. The st_aggr_union aggregate unions all the matching major_roads into one st_geometry object. This geometry is then passed as input to the st_buffer function on line 12.
By using the aggregate, it significantly improves performance because only one geometry is buffered and intersected with the parcels table verses buffering each major_road and then intersecting with the parcels table(which can potentially result in duplicate parcels being returned).
Once each spatial filter result set is derived, the intersect operator on line 6 returns just the parcels that are present in both the first and second spatial filter by intersecting the attributes ObjectID and shape.area.
The shape.area attribute is required in the select list on line 2 and line 7 because the attribute is used as the final predicate filter for the outer query on line 13, where area is between 100000 AND 500000.
Code:
SQL> SELECT COUNT(objectid)
2 FROM (SELECT a.objectid objectid, a.shape.area area
3 FROM parcels a, zoning b
4 WHERE b.zonegen_cl = 'IND'
5 AND sde.st_intersects(a.shape,b.shape) = 1
6 INTERSECT
7 SELECT a.objectid objectid, a.shape.area area
8 FROM parcels a,
9 (SELECT sde.st_aggr_union(shape) shape
10 FROM major_roads n
11 WHERE type = 1110) b
12 WHERE sde.st_intersects(a.shape,sde.st_buffer(b.shape,2640)) = 1)
13 WHERE area between 100000 AND 500000;
COUNT(OBJECTID)
---------------
604
Elapsed: 00:00:44.14
In summary, thinking outside the normal confines of writing SQL and using basic conditional operators (AND, OR, etc.), one can significantly improve performance by breaking down a statement into manageable, well performing parts to return a response.