Self-Managed: yb-monitoring

Install yb-monitoring with Helm. This will deploy workloads enabling monitoring for Yellowbrick Data Warehouse. Typically containts loki, grafana, prometheus and fluent-bit.

INFO

If using yellowbrick created storage classes, it is required to install yb-storageclass See Helm: yb-storageclass.

The Yellowbrick Operator also modifies the yb-monitoring helm chart for certain features such as changing the node group tier and enabling log retention on s3 buckets, and it is recommended to take this behaviour into account by reusing values when using automated deployments to manage this chart.

When using the commands or values outlined here, please make appropriate substitutions defined as:

Value	Description
{image-repo}	The container image repository pushed by the Deployer
{namespace}	The Kubernetes namespace into which you want to install
{role-arn}	when on AWS, the IAM role ARN of the Fluent bit service account
{version}	The chart version of `loki-stack`
{observability-storage}	The name of the storage location for the observability, when AWS an S3 bucket name, when Azure a Storage Account name. Must be same as what used when deploying yb-operator chart.
{oidc-provider-arn}	When on AWS, the OpenID Connect provider ARN
{oidc-provider}	When on AWS, the OpenID Connect provider
{partition}	When on AWS, the partition: aws or aws-gov
{storageclass}	The general purpose storage class name, e.g. AWS: gp3, Azure: standard, GCP: pd-balanced

Helm Chart

Running the Yellowbrick Deployer will push the Helm charts and container images you need into your cloud environment. For instructions on pushing assets using the Deployer, see the documentation.

Chart name: loki-stack

The get-assets subcommand can be used to find the version of chart name loki-stack, see cli reference.

Install Command

See Authenticating with ECR

bash

helm install loki oci://{image-repo}/loki-stack \
  -n {namespace}  \
  -f values.yaml \
  --version {version}

INFO

Please note that the release name while installing yb-monitoring must be loki and the namespace must be the same as supplied while installing the yb-operator helm chart. We recommend to use a namespace which is different from the one used for yb-operator.

Values

Please note that the node group for yb-monitoring workloads is managed by the yellowbrick operator, and we recommend not changing the node selectors and tolerations in the values file below.

yaml

fluent-bit:
  enabled: true
  image:
    repository: {image-repo}/yellowbrickdata/fluent-bit-plugin-loki
    tag: 2.8.8-13
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: { role-arn }
  nodeSelector: &nodeSelector
    cluster.yellowbrick.io/node_type: yb-mon-standard
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: {role-arn}
  tolerations: &tolerations
  - effect: NoSchedule
    key: cluster.yellowbrick.io/owned
    operator: Equal
    value: "true"
grafana:
  deploymentStrategy:
    type: Recreate
  downloadDashboardsImage:
    repository: {image-repo}/curlimages/curl
    tag: 8.11.1
  image:
    repository: {image-repo}/grafana/grafana
    tag: 12.0.0
  initChownData:
    image:
      repository: {image-repo}/library/busybox
      tag: 1.31.1
  persistence:
    storageClassName: yb-gp3
  sidecar:
    image:
      repository: {image-repo}/kiwigrid/k8s-sidecar
      tag: 1.28.0
  nodeSelector: *nodeSelector
  tolerations: *tolerations
ingress:
  enabled: false
loki:
  extraContainers:
  - command:
    - /bin/sh
    - -c
    - |-
      trap cleanup 15
      cleanup()
      {
          echo "Shutting down the loki pvc monitor"
          exit
      }

      while true; do
          /delete_files_if_low_memory.sh
          sleep 360 &
          PID=$!
          wait $PID
      done;
    env:
    - name: SPACEMONITORING_FOLDER
      value: /data/loki/chunks
    image: {image-repo}/yellowbrickdata/loki-log-trimmer:v5
    name: pvcleanup
    volumeMounts:
    - mountPath: /data
      name: storage
  image:
    repository: {image-repo}/grafana/loki
    tag: 3.5.0
  persistence:
    size: 200Gi
    storageClassName: {storageclass}
  nodeSelector: *nodeSelector
  tolerations: *tolerations
prometheus:
  alertmanager:
    enabled: false
    image:
      repository: {image-repo}/prometheus/alertmanager
      tag: v0.27.0
    nodeSelector: *nodeSelector
    tolerations: *tolerations
  alertmanagerFiles: {}
  configmapReload:
    alertmanager:
      enabled: true
      image:
        repository: {image-repo}/jimmidyson/configmap-reload
        tag: v0.8.0
    prometheus:
      image:
        repository: {image-repo}/jimmidyson/configmap-reload
        tag: v0.8.0
  kube-state-metrics:
    image:
      repository: {image-repo}/kube-state-metrics/kube-state-metrics
      tag: v2.13.0
    nodeSelector: *nodeSelector
    tolerations: *tolerations
  nodeExporter:
    image:
      repository: {image-repo}/prometheus/node-exporter
      tag: v1.8.0
    nodeSelector: *nodeSelector
    tolerations: *tolerations
  processExporter:
    image:
      repository: {image-repo}/ncabatoff/process-exporter
      tag: sha-7ef0b73
    nodeSelector: *nodeSelector
    tolerations: *tolerations
  pushgateway:
    image:
      repository: {image-repo}/prom/pushgateway
      tag: v1.9.0
    nodeSelector: *nodeSelector
    tolerations: *tolerations
  server:
    image:
      repository: {image-repo}/prometheus/prometheus
      tag: v2.49.1
    persistentVolume:
      enabled: true
      size: 100Gi
      storageClass: {storageclass}
    nodeSelector: *nodeSelector
    tolerations: *tolerations
    extraInitContainers:
    - image: {image-repo}/library/busybox:1.31.1
      name: prometheus-wal-cleanup
      command:
      - /bin/sh
      - -c
      - if [ $(du -sm /data/wal | cut -f1) -gt 1024 ]; then rm -rf /data/wal/*; fi
      volumeMounts:
      - mountPath: /data
        name: storage-volume
  serverFiles:
    alerting_rules.yml:
      groups:
      - name: Host alerts
        rules:
        - alert: PVCUtilizationHigh
          annotations:
            message: The persistentVolume used by {{ $labels.persistentvolumeclaim
              }} is {{ $value | humanize }}% utilized. Please check and take appropriate
              action.
            summary: PVC utilization on PVC {{ $labels.persistentvolumeclaim }} is
              high
          expr: 100 * sum(kubelet_volume_stats_used_bytes) by(persistentvolumeclaim)
            /sum(kubelet_volume_stats_capacity_bytes) by (persistentvolumeclaim) >
            90
          for: 5m
          labels:
            severity: warning
        - alert: HostOutOfDiskSpace
          annotations:
            description: |-
              Disk is almost full (< 80% left)
                VALUE = {{ $value | humanize }}
                LABELS: {{ $labels }}
            message: |-
              Disk is almost full (< 80% left)
                VALUE = {{ $value | humanize }}%
                Node Name: {{ $labels.node }}
            summary: Host out of disk space (instance {{ $labels.node }})
          expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"})
            > 80
          for: 1s
          labels:
            severity: warning

Creating Cloud Infrastructure

AWS

When installing on AWS, an IRSA service account is used. For details on IRSA, please see the AWS documentation.

Create the IAM role:

bash

aws iam create-role \
  --role-name yb-eks-pod-fluent-bit-{instance-name}-{region} \
  --assume-role-policy-document file://trust-policy.json

The trust policy:

json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "{oidc-provider-arn}"
     },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "{oidc-provider}:sub": "system:serviceaccount:{namespace}:yb-{namespace}-worker-sa"
       }
     }
   }
  ]
}

Create the IAM policy

bash

aws iam put-role-policy \
  --role-name yb-eks-pod-fluent-bit-{instance-name}-{region} \
  --policy-name diags-upload \
  --policy-document file://iam-policy.json

The IAM policy:

json

{
  "Version": "2012-10-17",
  "Statement": [
      {
          "Effect": "Allow",
          "Action": [
              "s3:*"
          ],
          "Resource": "arn:aws:s3:::{observability-storage}/*"
      }
  ]
}

To the values above, add these values in the fluent-bit block and include the ARN of the AWS IAM role in place of {role-arn}:

yaml

fluent-bit:
...
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: { role-arn }
...

Workload Management

Distributing Data

Bulk Loading Tables

Bulk Load Examples

Running a Bulk Load

Loading Tables from Parquet Files

ybload Command

Load Data with SQL

Loading Data from Object Storage

Loading from Amazon S3

Loading from Azure Blob Storage

Loading Tables with Spark

Setting up and Running a Spark Job

Setting Up the ybrelay Service

Trickle Loading Data via JDBC

Unloading Data to Object Storage

Unloading Data to Parquet Files

ybunload Command

Installing ybtools

Setting Up a Database Connection

Configuring SSL/TLS for Tools and Drivers

Secure Connections for ODBC/JDBC Clients and ybsql

Appliance

Appliance: Disk Encryption

Setting Up Encrypted Drives

Remote Diagnostics

System Alerts

Creating an Alert Endpoint

Using the System Management Console

ybcli Reference

ybcli: config

Cloud

Configuring

Vanity DNS

Yellowbrick Manager

Installing

CLI Install Instructions

Public Install Instructions

Private Install Instructions

Self-Managed Install Instructions

Permissions

Kubernetes Guides

Observability

Observability Alerts

Observability Metrics

Databases

Backup & Restore

Overview

ybbackup Commands

ybbackupctl Commands

ybrestore Commands

Database Replication

Managing Replication

Setting Up Replication

Encrypting Sensitive Data

LDAP Integration

LDAP Authentication

Synchronizing Users and Groups

Metering

System Views

sys.lock

Workload Management

Creating WLM Resource Pools

Creating WLM Rules

Compatibility Parameters

Data Processing and Formatting

Feature Enablement

General

Tuning

Yellowbrick Row Store (YRS) Alerting Parameters

ybsql \copy Command

ybsql Properties and Variables

SQL Commands

CREATE EXTERNAL FORMAT

CREATE EXTERNAL TABLE

CREATE TABLE

GRANT

Plan Hinting

SELECT

FROM Clause

Self-Managed: yb-monitoring

Helm Chart

Install Command

Values

Creating Cloud Infrastructure

AWS