How Queries Are Executed
An important facet of making decisions about system resource allocation is an understanding of query execution. Yellowbrick queries move through several finite states from submission to completion. A basic understanding of this process will help you develop an effective WLM strategy, especially with respect to the creation of WLM rules. You can define WLM rules that are applied as queries enter certain states and use these rules to develop a WLM profile that optimizes resource allocation for a given workload.
As a query passes through each state in its life cycle, runtime statistics are captured and logged. These statistics provide a measure of the time spent in each phase of query execution, giving administrators a means of monitoring and analyzing query performance. Wait times and actual processing times are measured at each stage.
The following diagram shows the life cycle of a query. Each query passes through several states on the manager node, while it is being prepared for execution, then it starts running (executing) on the worker nodes.
The diagram also identifies when queries can be cancelled or restarted. Once submitted, a query runs to completion, is cancelled, or fails with an error (Done
, Cancel
, and Error
states). If a query is restarted or returns an error, it may re-enter the cycle in the Assemble
state, but ultimately, all queries finish in one of the three completion states. (If a query is cancelled, it cannot be restarted.)
A query has the following states:
- Submit
The query is received by the front-end database on the manager node and enters this state first. The query is validated to make sure it contains well-formed SQL. This state is the earliest phase of a query where you can apply WLM throttle rules (limit concurrent queries). You can also reject queries when they are in this state. The
submit_time
, as logged insys.log_query
, is the wall-clock time when the front-end database starts the query.- Parse
The query is parsed into an initial abstract syntax tree (AST) and the locks required to run the query are acquired. You cannot apply rules to queries in this state. A trivial query (such as a single-row insert) may jump straight from this state to the
Run
state. The time the query spends waiting for locks is tracked in thewait_lock_ms
column insys.query
. If a query is blocked on the acquisition of a lock, the description of the lock appears in theblocked
column insys.query
.- Plan
The query plan is generated. Details about the tables referenced in the query are available to WLM (from the AST), as well as the
type
of the query. You cannot apply rules to queries in this state, but a broad range of rule definitions is available in the subsequentassemble
andcompile
states.- Assemble
The query plan is turned into an artifact that can be compiled. Additional optimizations are applied at this point, and filtering code for scans is generated.
- Compile
In this state, the assembled query is turned into a compiled binary that can be sent to the workers. Depending on the complexity of the query, it can remain in this state for a significant amount of time. This phase of the query lifecycle is highly parallelized. Information available to rules includes memory requirements, confidence in memory estimates, and the number of workers. You can write rules to throttle compile resources and make decisions based on memory information.
- Acquire Resources
Resources are acquired for the query. Compile statistics are now available to WLM, as well as the size of the loader cache. You cannot apply rules to queries in this state.
- Run
The query is sent to the worker nodes and starts executing on one or more nodes. Detailed query statistics are now available. Runtime WLM rules can be applied.
- Client Wait
The query is sending rows to the client and waiting for the client to receive or request all the rows. You cannot apply rules to queries in this state.
- Completion states: Done, Error, Cancel
Done
state is reached after the last result rows have been received by the client.Error
state is reached when a query fails for some reason (and is not restarted).Cancel
state is reached when the query is cancelled by the user. Completion rules can be applied when queries reach any of these states, to take simple actions such as logging messages.Queries remain in the
sys.query
view (with stateDone
) until they are asynchronously written to thesys.log_query
view. At that point they no longer appear insys.query
. The same query ID may appear in both views for a brief period of time.- Restart states: On Error, On User (via Rule or Administrator)
Queries may be restarted by the user or by the system.
A number of known query error codes will trigger an automatic restart, causing query execution to be retried. For example, a query that runs out of memory is typically restarted. A query that is moved from one resource pool to another may also be restarted by the system if the move does not provide adequate resources.
A query may also be restarted based on the application of a specific rule or when an administrator runs a SQL
RESTART
command.Queries are restartable from the
Assemble
,Compile
,Acquire Resources
, andRun
states and may cycle through those states multiple times. Restarted queries always restart from theAssemble
state. When a query restarts, it may run under different conditions because the workload is changing as other queries are submitted or completed. Additional resources may or may not become available to a restarted query. Note that when a query restarts, it is subject to the same sequence of rules that were evaluated and applied when it was first submitted (as well as any specific restart rules when it goes into a restart state). A rule that was evaluated when a query first entered theCompile
state, for example, may have a different outcome the second time it enters that state.A small set of WLM actions can be applied when a query enters a
Restart
state, mainly for logging and audit purposes.
See Rule Types for more details.
Parent topic:Workload Management