There are many options when configuring Oracle Real Application Clusters (RAC) and Transparent Application Failover (TAF), many variations in hardware infrastructures and networks that support these configurations, and wide variation in customer workflows. ArcGIS supports RAC. TAF configurations vary widely, however, so a blanket statement cannot be made that ArcGIS supports TAF in all possible implementations.
An important consideration for TAF failover is that ArcGIS connections obtain an in-memory lock that is session specific and managed by Oracle. This lock is used to ensure any schema or state locks are valid. When a failover occurs, the connection made to the surviving node is a new session and therefore results in the loss of the original lock. This can cause unexpected behavior in some editing scenarios. Loss of the original lock also allows modifications to the schema in read-only operations, which can result in unexpected behavior if schema locking is disabled on ArcGIS Server services that reference the data. Therefore, it is very important that thorough testing be done before implementing Oracle TAF with ArcGIS.
Oracle RAC is in use at several Esri customer sites using geodatabases, and Esri has tested basic RAC and TAF functionality for failover behavior. The results of that testing are described in this article.
Note: Regardless of the configuration used, Esri strongly recommends that each customer perform thorough testing to ensure that all workflows and applications work as expected during failover scenarios.
Oracle RAC provides clustering and high-availability (HA) for Oracle databases, allowing Oracle relational database management system software on multiple server nodes to manage a single Oracle database, thereby providing a resilient architecture for the database services. This is typically combined with a resilient storage tier and Oracle client configuration to provide failover in the case of a server node failure. TAF is typically part of an Oracle RAC configuration, providing client-side functionality that allows clients to reconnect to surviving databases in the event of a failure of a database instance.
Esri has found that setting the TAF failover type to Select provides the most highly available behavior when used with ArcGIS. Select allows applications that began fetching rows from a cursor before failover to continue fetching rows after failover. Any active transactions are rolled back at the time of failure because TAF cannot preserve active transactions after failover.
When using Select failover, connections in ArcGIS for Desktop and ArcGIS Server switch to a surviving node for most simple operations such as zoom, pan, or refresh. There is a delay before the connection resumes, and that delay depends upon the infrastructure supporting the RAC as well as other configuration parameters.
Testing also found that when using Select failover, connections are switched to a surviving node during simpler non-versioned and versioned edit sessions.
Larger bulk data loads fail, however, since active or in-progress transactions are automatically rolled back when using Select failover.
Other failover types
TAF offers two other failover types: None and Session. When using either of these failover types, ArcGIS for Desktop connections fail if a node fails, and an error message similar to the following is returned:
One or more layers failed to draw: <user>.<layer>: Failure to access the DBMS server [ORA-03114: not connected to ORACLE]
Manual reconnection to the surviving node from ArcGIS for Desktop is required.
If the primary node fails, ArcGIS for Server connects automatically to the surviving node when the next ArcGIS Server operation is performed, although there is a pause while the failover connection is made.