Appearance
Restart Rule Examples
The following examples describe rules that are evaluated when queries are restarted because of an error.
Modified Restart on Error System Rule
The global_restartErrorCodes
and global_restartErrorCodesSuperuser
system rules trigger a one-time restart attempt for queries that return a small subset of recoverable error codes. If you want to extend these rules to apply to more error codes, Yellowbrick recommends that you create new rules rather than modify existing rules. (If system rule versions change during an upgrade, your modifications to those existing rules will be lost.)
Start by looking at the default definition of global_restartErrorCodes
:
yellowbrick=# select * from sys.wlm_active_rule where rule_name ='global_restartErrorCodes';
-[ RECORD 1 ]+-----------------------------------------------------------------------------------------------------
profile_name | (global)
rule_name | global_restartErrorCodes
rule_type | restart_for_error
order | 1
enabled | t
superuser | f
expression | // Recoverable error codes: +
| // - KE041 # Library file write +
| // - YB044 # RecoverableGeneric +
| // - WM001 # ERRCODE_WORKER_OFFLINE +
| // - KE039 # RPCCHANNELBROKEN +
| // - KE038 # RPCCHANNELCLOSED +
| // - P0004 # Assert failure +
| // - WM002 # System not ready +
| // - WM003 # Query restarting +
| if (w.errorRecoverable === undefined || w.errorRecoverable == null || w.errorRecoverable) { +
| w.errorRecoverable = String(w.errorCode).match(/KE041|YB044|WM001|KE039|KE038|P0004|WM002|WM003/);+
| } +
|
You can use Yellowbrick Manager or a SQL command to create a new rule that copies most of this rule definition. For example, in Yellowbrick Manager, go to Workload Management > Global Rules > +Rule. Use the same main settings as the existing rule but a different rule name:
Use the same rule definition, but add another error code from the recoverable list (KE001
in this example):
if (w.errorRecoverable === undefined || w.errorRecoverable == null || w.errorRecoverable) {
w.errorRecoverable = String(w.errorCode).match(/KE041|YB044|WM001|KE039|KE038|P0004|WM002|WM003|KE001/);
}
Now disable the existing global_restartErrorCodes
rule and activate changes to make your new rule, global_restartErrorCodesKE001Added
, take effect instead.
Alternatively you can enable both rules, but in that case your new rule would need a higher rule order, such as 10
, so that the existing rule is applied first. Also you would need to remove the following code from the rule definition:
if (w.errorRecoverable === undefined || w.errorRecoverable == null || w.errorRecoverable) { }
Note that there are two instances of the ErrorCodes
rule: one for superusers and one for non-superusers. You may need to modify both of these to suit your requirements.
Restart and Try to Expand Resources
The flex
profile has a predefined rule, flex_expandResourcesErrorHandler
, which attempts to increase the resources available to a query that errors out with one of several specified error codes. The attempt to expand resources happens when the query restarts and only applies to the flex
profile.
The rule is defined as follows:
log.info(w + ' is restarting for error ' + w.errorCode);
if (String(w.errorCode).match(/53200|KE002|YB004|KE032|KE029|YB006|EEOOM/)) {
// See if we can't expand resources; if we can, lets try the query with more resources.
if (!wlm.assignMaximumResources(w)) {
w.errorRecoverable = false;
log.info(w + ' cannot expand resources; marked as not recoverable');
} else {
w.errorRecoverable = true;
log.info(w + ' expanded resources for restart (memory ' + w.requestedMemoryMB + ', spill ' + w.requestedSpillMB + ')');
}
}
The wlm.assignMaximumResources(w)
property returns true if expanded resources (memory and spill space) are available from the flex
pool. Additional resources may or may not be available, depending on concurrent query activity in that pool. This rule also logs appropriate INFO
messages, either marking the query as not recoverable or listing the resources available on restart.
See also Recoverable Error Codes.