Restarting Queries
This section describes how WLM handles cases where the system restarts queries or an administrator (or rule) triggers a restart. See also Moving Queries.
Restartable Queries
assemble
compile
acquire_resources
run
Restart On Error
assemble
,
compile
, acquire_resources
, or
run
states will run the query again, going through all
earlier states in the query's life cycle. However, restarting a query during
Restart On Error
will only move the query to the target
resource pool.
If a query has already returned its first row to the client
(client_wait
state), it cannot be restarted.
select
ctas
insert
(INTO SELECT
, but notVALUES
)
Within these types, a query cannot be restarted if it references external tables.
If a query is processing an error or is already in the process of restarting, it cannot be restarted; however, a restart may be possible later. See the Default Error Recovery section.
Default Error Recovery
By default, the same query may be restarted only once before it is deemed to
be unrecoverable. The restart policy in system-defined rules
(global_restartErrorPolicy
and
global_restartErrorPolicySuperuser
) enforces this behavior; in
general, you do not need to change it. Depending on the type of error a query
encounters, attempting to restart it more than once may not be of practical value.
The query may error out in exactly the same way before it eventually aborts.
global_restartErrorCodes
and
global_restartErrorCodesSuperuser
constrain the default restart
policy so that it applies only to queries that fail with the following subset of
recoverable error
codes:KE038 RPCCHANNELCLOSED
KE039 RPCCHANNELBROKEN
KE041 Library file write
P0004 ERRCODE_ASSERT_FAILURE
WM001 ERRCODE_WORKER_OFFLINE
WM002 ERRCODE_SYSTEM_NOT_READY
WM003 ERRCODE_CANCEL_FOR_RESTART
YB044 RecoverableGeneric
Queries that fail with other recoverable error codes (or any other error code) are not subject to the
policy defined by these rules. Again, you do not need to change this behavior, but
you can create new, modified ErrorCodes
rules that include other
codes from the recoverable list. See Restart Rule Examples.
flex_expandResourcesErrorHandler
)
defines restart behavior for queries that run in the flex
profile:
log.info(w + ' is restarting for error ' + w.errorCode);
if (String(w.errorCode).match(/53200|KE002|YB004|KE032|KE029|YB006|EEOOM/)) {
// See if we can't expand resources; if we can, lets try the query with more resources.
if (!wlm.assignMaximumResources(w)) {
w.errorRecoverable = false;
log.info(w + ' cannot expand resources; marked as not recoverable');
} else {
w.errorRecoverable = true;
log.info(w + ' expanded resources for restart (memory ' + w.requestedMemoryMB + ', spill ' + w.requestedSpillMB + ')');
}
}
This rule logs messages for queries that restarted after failing with a specific
subset of the recoverable error codes (errors that typically indicate conditions
under which a query is likely to benefit from more resources). For example, if a
query runs out of memory, the flex
profile can expand or contract
its resources to accommodate different levels of concurrency, effectively making
more (or all) of its memory available.
If wlm.assignMaximumResources
returns true
, this
means that resources available to the pool (memory, temp space, priority/CPU) were
expanded, and the INFO
message that is sent when the query
completes logs those runtime resource values. If the resources were not expanded,
the INFO
message reports that instead. Note that the
wlm.assignMaximumResources
property is in place for the purpose
of error recovery and may not be of practical use within your own WLM rules.
User-defined rules are more likely to benefit from setting requested
resources, as defined by a different set of memory and spill space properties, or by
setting query priority.
Requesting Additional Resources for Restarted Queries
Administrators can restart a running query in another pool explicitly by using the RESTART query command. When you restart a query in this way, you can request specific resources (memory, priority, and spill space), which are allocated if they are available when the command is submitted.
premdb=# restart 347010 to wlm resource pool large
with ( priority high, memory '500MB', memory '40%', spill '10%' );
RESTART
if ((String(w.application).indexOf('ybsql') >= 0) &&
w.user === 'bobr' &&
w.errorCode === 'EEOOM') {
w.requestedMemoryPercent = 100;
w.restartInResourcePool('max_memory_pool');
}
max_memory_pool
pool with 100%
of its memory if an OOM error occurs for user bobr
running a
ybsql
query.