Restart Rule Examples
Modified Restart on Error System Rule
The global_restartErrorCodes
and
global_restartErrorCodesSuperuser
system rules trigger a
one-time restart attempt for queries that return a small subset of recoverable error
codes. If you want to extend these rules to apply to more error codes, Yellowbrick
recommends that you create new rules rather than modify existing rules. (If system
rule versions change during an upgrade, your modifications to those existing rules
will be lost.)
global_restartErrorCodes
:
yellowbrick=# select * from sys.wlm_active_rule where rule_name ='global_restartErrorCodes';
-[ RECORD 1 ]+-----------------------------------------------------------------------------------------------------
profile_name | (global)
rule_name | global_restartErrorCodes
rule_type | restart_for_error
order | 1
enabled | t
superuser | f
expression | // Recoverable error codes: +
| // - KE041 # Library file write +
| // - YB044 # RecoverableGeneric +
| // - WM001 # ERRCODE_WORKER_OFFLINE +
| // - KE039 # RPCCHANNELBROKEN +
| // - KE038 # RPCCHANNELCLOSED +
| // - P0004 # Assert failure +
| // - WM002 # System not ready +
| // - WM003 # Query restarting +
| if (w.errorRecoverable === undefined || w.errorRecoverable == null || w.errorRecoverable) { +
| w.errorRecoverable = String(w.errorCode).match(/KE041|YB044|WM001|KE039|KE038|P0004|WM002|WM003/);+
| } +
|

Now use the same rule definition, but add another error code from the recoverable
list (KE001
in this example):

Now disable the existing global_restartErrorCodes
rule and activate
changes to make your new rule, global_restartErrorCodesKE001Added
,
take effect instead.
10
, so that the existing rule is applied
first. Also you would need to remove the following code from the rule definition:
if (w.errorRecoverable === undefined || w.errorRecoverable == null || w.errorRecoverable) { }
Note that there are two instances of the ErrorCodes
rule: one for
superusers and one for non-superusers. You may need to modify both of these to suit
your requirements.
Restart and Try to Expand Resources
The flex
profile has a predefined rule,
flex_expandResourcesErrorHandler
, which attempts to increase
the resources available to a query that errors out with one of several specified
error codes. The attempt to expand resources happens when the query restarts and
only applies to the flex
profile.
log.info(w + ' is restarting for error ' + w.errorCode);
if (String(w.errorCode).match(/53200|KE002|YB004|KE032|KE029|YB006|EEOOM/)) {
// See if we can't expand resources; if we can, lets try the query with more resources.
if (!wlm.assignMaximumResources(w)) {
w.errorRecoverable = false;
log.info(w + ' cannot expand resources; marked as not recoverable');
} else {
w.errorRecoverable = true;
log.info(w + ' expanded resources for restart (memory ' + w.requestedMemoryMB + ', spill ' + w.requestedSpillMB + ')');
}
}
The wlm.assignMaximumResources(w)
property returns true if expanded
resources (memory and spill space) are available from the flex
pool. Additional resources may or may not be available, depending on concurrent
query activity in that pool. This rule also logs appropriate INFO
messages, either marking the query as not recoverable or listing the resources
available on restart.
See also Recoverable Error Codes.