Load Balancer for Compute Clusters

Compute Cluster Load Balancer

The compute cluster load balancer enhances the flexibility and performance of compute cluster management. This feature allows users or roles to distribute workloads across multiple compute clusters, ensuring optimized resource utilization and availability.

Overview

The cluster load balancer enables users or roles to define a set of compute clusters to be used for query execution. The specific compute cluster chosen for a query depends on various factors, including:

Transactional State: If the current session has written data in its transaction, the cluster load balancer setting is ignored. Instead, the compute cluster where the data was written is used to maintain data residency.
Least Busy Heuristic: If no data has been written in the current transaction, the system chooses the compute cluster with the fewest running and queued queries. If two or more clusters have the same level of "least busy," one of them is selected at random to ensure fairness.
Cluster Availability: Clusters that are offline or suspended are excluded from consideration, even if they are part of the configured list.

Users can configure their default cluster behavior with a comma-separated list of clusters using the ALTER USER or ALTER ROLE commands:

sql

ALTER USER <username> SET DEFAULT_CLUSTER "<cluster1>,<cluster2>,<cluster3>";
ALTER ROLE <rolename> SET DEFAULT_CLUSTER "<cluster1>,<cluster2>,<cluster3>";

Scaling Out Clusters

The cluster load balancer also allows users to scale out the number of clusters they have access to, distributing workloads across them. This capability is particularly beneficial for zero-downtime scaling and for workloads that require higher concurrency rather than more compute, memory, or storage per query. By assigning a user multiple default clusters, the instance automatically directs queries to the least busy cluster. Additionally, this feature supports seamless suspension and resumption of clusters, providing greater elasticity.

Example:

Users with explicit grants on clusters can add new clusters to their default cluster settings:

sql

GRANT USAGE ON CLUSTER "bi_cluster2" TO john_doe;
ALTER USER "john_doe" SET DEFAULT_CLUSTER "bi_cluster,bi_cluster2";

In this example, the user has access to multiple clusters, enabling scalable and elastic query processing without downtime. For instance, adding bi_cluster2 increases concurrency for workloads while avoiding minimal distribution penalties.

Important: Users or roles assigned a load balancer cluster setting must also have GRANT USAGE ON CLUSTER <name> privileges for each cluster listed.

Key Concepts

Dynamic Cluster Selection: The compute cluster is dynamically selected based on the "least busy" heuristic or transactional state.
Transactional Data Residency: When a transaction includes temporary data, the session remains on the compute cluster where the data was written until the transaction is committed.
Fair Scheduling: If multiple clusters have equal load, one is randomly chosen to balance query distribution.
Offline or Suspended Clusters: Compute clusters that are offline or suspended are not considered during query scheduling.
Cluster Name Length Limitation: The cluster name setting for a role must fit within 128 characters. This limitation restricts the number of concatenated cluster names that can be included in the load balancer setting. To maximize the number of clusters, use short and descriptive names for clusters.

Viewing Default Cluster Settings

To view the current default cluster setting for a user or role, query the sys.users or sys.roles system tables:

sql

SELECT default_cluster FROM sys.users WHERE username = 'john_doe';
SELECT default_cluster FROM sys.roles WHERE rolename = 'data_scientists';

Interaction with `USE CLUSTER`

The USE CLUSTER command can override the default cluster or cluster list for the current session. However, when the session ends, the user or role reverts to their default cluster list setting.

Best Practices

List Maintenance: Ensure the list of clusters is kept up-to-date to avoid routing workloads to retired or decommissioned clusters.
Uniform Configuration: Use the same WLM profile and cluster configuration, including hardware instance type and node count, to achieve predictable workload performance across load balanced compute clusters.
Workload Optimization: Configure cluster lists to prioritize clusters with the appropriate capacity and proximity for intended workloads.
Testing and Validation: Simulate various load and failover scenarios to validate the behavior of the load balancer.
Short Cluster Names: Use short and descriptive names for compute clusters to maximize the number of clusters that can be included in the load balancer setting.
Grant Privileges: Ensure all users or roles assigned a cluster load balancer setting have the necessary GRANT USAGE ON CLUSTER <name> privileges.

Configuring SSL/TLS for Tools and Drivers

Secure Connections for ODBC/JDBC Clients and ybsql

sys.lock

Bulk Load Examples

Running a Bulk Load

Loading Tables from Parquet Files

ybload Command

Loading from Amazon S3

Loading from Azure Blob Storage

Setting up and Running a Spark Job

Setting Up the ybrelay Service

LDAP Authentication

Synchronizing Users and Groups

Appliance: Disk Encryption

Setting Up Encrypted Drives

Remote Diagnostics

System Alerts

Creating an Alert Endpoint

Using the System Management Console

ybcli Reference

ybcli: config

AWS Marketplace

Create Stack

Docker

Cloud: Configuration

Vanity DNS

Yellowbrick Manager

Cloud: Enterprise Edition Getting Started

SQL-Based Loads from External Storage

Cloud: Installation

CLI Install Instructions

Permissions

Private Install Instructions

Public Install Instructions

Cloud: Kubernetes Guides

CREATE EXTERNAL FORMAT

CREATE EXTERNAL TABLE

CREATE TABLE

GRANT

Plan Hinting

SELECT

GROUP BY Clause

Subqueries

Data Type Casting

DECIMAL

JSON

JSONB

SQL String Constants

Aggregate Functions

Conditional Expressions

Datetime Functions

Formatting Functions

Geospatial functions

Mathematical Functions

Network Address Functions

Pattern Matching

Regular Expression Details

SQL Operators and Pattern Matching Functions

SQL Conditions

SQL User Defined Function (UDF)

SQL UDF Create Function

String Functions

ENCRYPT_KS

System Functions

Type-Safe Casting Functions

Window Functions

Creating WLM Resource Pools

Creating WLM Rules

Rule Examples

Compute Cluster Load Balancer ​

Overview ​

Scaling Out Clusters ​

Key Concepts ​

Viewing Default Cluster Settings ​

Interaction with USE CLUSTER ​

Best Practices ​

Further Reading ​