Skip to content

Compute Clusters

Yellowbrick database queries run on "compute clusters." On Tinman and Andromeda appliance platforms, there's one compute cluster where each node is a blade server, and the number of nodes in use in the cluster is a function of how many blades are inserted. All user queries run against this default cluster, and the columnar data is written locally to storage devices within the cluster. The size of the cluster is constrained by how many physical blades can be inserted into the appliance.

On cloud platforms, we implement an architecture that's fully elastic, with separate storage and compute. Multiple compute clusters are managed through SQL enabling workload isolation. Different users or roles are assigned to compute clusters, and a cluster load balancer allows intelligent distribution of work across to enable higher concurrency. Data is written to shared object storage and cached on local ephemeral storage devices. The total sizes of all clusters is effectively constrained by the availability zone the software is running in.

Creating New Clusters (cloud platforms)

Compute clusters are created using the SQL command CREATE CLUSTER. Similar commands exist to alter clusters or drop clusters. Alternatively, a simple wizard in Yellowbrick Manager provides a graphical way of creating clusters: To use it, navigate to Instance Management, choose your instance and click on the + Cluster button.

Examining Clusters

To see what clusters are currently in operation, and the status of nodes in the cluster, use the sys.cluster system view. On appliance platforms you'll only see one cluster present called yellowbrick comprised of all the blades in the appliance.

Cluster Node Acquisition (cloud platforms)

The nodes in the cluster are acquired in four stages, as reflected in the node counts in sys.cluster. The stages are as follows:

  1. The number of desired nodes for a cluster is written into the nodes column when the cluster is created or modified, and the cloud provider is requested to provide more nodes.
  2. As nodes are received from the cloud provider, the counter prepared_nodes is incremented.
  3. Yellowbrick provisions the Kubernetes pods on the node, forcing the software to be downloaded. The counter configured_nodes is incremented when done.
  4. The Yellowbrick database software component called the "worker" is launched, and the counter in ready_nodes is updated.
  5. The worker database software comes online and the node is ready to process database queries, and the ready_workers counter is incremented.

The Yellowbrick Manager application, when creating clusters, reflects these last four stages as Preparing, Configuring, Starting and Running respectively. Until all nodes requested have been acquired and started up (nodes=ready_workers) the state of the cluster will not be marked as RUNNING.

If the process of starting a cluster takes a long time it may be because of account quota issues or a shortage of the requested hardware instance type in the given region and availability zone. Check account quotas, then consider using ODCRs or Reserved Nodes to acquire nodes in the background. Unfortunately the public cloud providers do not provide public interfaces for monitoring node availability, but often support staff or engineers from the cloud provider can do so. If you regularly suffer from a lack of node availability, we suggest talking to your cloud account team for their advice on how to best ameliorate.

Default and System Clusters

A Yellowbrick instance must have one and only one default cluster, which remains running as the default resource for executing queries and other operations. If an instance has only a single cluster, it is by definition the default cluster. If an instance has only a single cluster, it is also implied to be the system cluster. The system cluster is a cluster designated to do all system work (background operations that run under internal system accounts, such as flushing data from the row store, updating statistics or garbage collection). The default cluster and system cluster may be the same.

To alter the default and system clusters, use the ALTER SYSTEM SET CLUSTER commands. To see which are the current default and system clusters, see the is\_default and is\_system columns in sys.cluster.

A user may have a designated default cluster where all queries submitted by that user are run. The default cluster for a user may be set with an ALTER USER SET DEFAULT_CLUSTER command.

Cluster Policy Options

Every cluster has an operating policy, as defined by the following options:

Auto-suspend:
This option ensures that a cluster does not consume resources when it is not being used.
Auto-resume:
This option starts up a cluster automatically when its use is requested. The cluster will resume on submission of the first query that attempts to use it. If a query takes a long time to run, this is often why.
Workload Management Profile:
Each cluster can be assigned a specific workload management profile, or it can run with the default profile.

Operations That Do Not Require a Running Cluster

A running cluster is not needed for all queries and system operations. Users can run "front-end" queries, and the system can run commands that manage the instance. An instance is operational in this way before its first (default) cluster is created, and that cluster can be managed (suspended/resumed).

Examples of front-end queries include CREATE DATABASE, commands that create external objects, and other high-level commands do not require a running cluster. You can run COUNT(*) queries, and return the results of certain system functions, such as CURRENT_DATABASE() and CURRENT_USER, when no FROM clause is present in the query. You can also query certain system views, but not the sys.log_query set of views. You can insert rows into the row store.

Note: If you are creating your own primary storage for the databases in a given data warehouse instance, you should create that storage before creating any clusters or databases. See CREATE EXTERNAL LOCATION.

Modifying Clusters

After creating a cluster, you can modify it in a few ways:

  • SUSPEND/RESUME the cluster and apply a specified wait time.
  • Set the automatic SUSPEND/RESUME policy.
  • RENAME the cluster.
  • Scale the cluster up or down by adding or subtracting from the node count.
  • Associate the cluster with a different WLM profile.

See also ALTER CLUSTER.