Appearance
Self-Managed: yb-monitoring
Install yb-monitoring with Helm. This will deploy workloads enabling monitoring for Yellowbrick Data Warehouse. Typically containts loki, grafana, prometheus and fluent-bit.
INFO
If using yellowbrick created storage classes, it is required to install yb-storageclass See Helm: yb-storageclass.
The Yellowbrick Operator also modifies the yb-monitoring helm chart for certain features such as changing the node group tier and enabling log retention on s3 buckets, and it is recommended to take this behaviour into account by reusing values when using automated deployments to manage this chart.
When using the commands or values outlined here, please make appropriate substitutions defined as:
| Value | Description |
|---|---|
| {image-repo} | The container image repository pushed by the Deployer |
| {namespace} | The Kubernetes namespace into which you want to install |
| {role-arn} | when on AWS, the IAM role ARN of the Fluent bit service account |
| {version} | The chart version of loki-stack |
| {observability-storage} | The name of the storage location for the observability, when AWS an S3 bucket name, when Azure a Storage Account name. Must be same as what used when deploying yb-operator chart. |
| {oidc-provider-arn} | When on AWS, the OpenID Connect provider ARN |
| {oidc-provider} | When on AWS, the OpenID Connect provider |
| {partition} | When on AWS, the partition: aws or aws-gov |
| {storageclass} | The general purpose storage class name, e.g. AWS: gp3, Azure: standard, GCP: pd-balanced |
Helm Chart
Running the Yellowbrick Deployer will push the Helm charts and container images you need into your cloud environment. For instructions on pushing assets using the Deployer, see the documentation.
Chart name: loki-stack
The get-assets subcommand can be used to find the version of chart name loki-stack, see cli reference.
Install Command
bash
helm install loki oci://{image-repo}/loki-stack \
-n {namespace} \
-f values.yaml \
--version {version}INFO
Please note that the release name while installing yb-monitoring must be loki and the namespace must be the same as supplied while installing the yb-operator helm chart. We recommend to use a namespace which is different from the one used for yb-operator.
Values
Please note that the node group for yb-monitoring workloads is managed by the yellowbrick operator, and we recommend not changing the node selectors and tolerations in the values file below.
yaml
fluent-bit:
enabled: true
image:
repository: {image-repo}/yellowbrickdata/fluent-bit-plugin-loki
tag: 2.8.8-13
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: { role-arn }
nodeSelector: &nodeSelector
cluster.yellowbrick.io/node_type: yb-mon-standard
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: {role-arn}
tolerations: &tolerations
- effect: NoSchedule
key: cluster.yellowbrick.io/owned
operator: Equal
value: "true"
grafana:
deploymentStrategy:
type: Recreate
downloadDashboardsImage:
repository: {image-repo}/curlimages/curl
tag: 8.11.1
image:
repository: {image-repo}/grafana/grafana
tag: 12.0.0
initChownData:
image:
repository: {image-repo}/library/busybox
tag: 1.31.1
persistence:
storageClassName: yb-gp3
sidecar:
image:
repository: {image-repo}/kiwigrid/k8s-sidecar
tag: 1.28.0
nodeSelector: *nodeSelector
tolerations: *tolerations
ingress:
enabled: false
loki:
extraContainers:
- command:
- /bin/sh
- -c
- |-
trap cleanup 15
cleanup()
{
echo "Shutting down the loki pvc monitor"
exit
}
while true; do
/delete_files_if_low_memory.sh
sleep 360 &
PID=$!
wait $PID
done;
env:
- name: SPACEMONITORING_FOLDER
value: /data/loki/chunks
image: {image-repo}/yellowbrickdata/loki-log-trimmer:v5
name: pvcleanup
volumeMounts:
- mountPath: /data
name: storage
image:
repository: {image-repo}/grafana/loki
tag: 3.5.0
persistence:
size: 200Gi
storageClassName: {storageclass}
nodeSelector: *nodeSelector
tolerations: *tolerations
prometheus:
alertmanager:
enabled: false
image:
repository: {image-repo}/prometheus/alertmanager
tag: v0.27.0
nodeSelector: *nodeSelector
tolerations: *tolerations
alertmanagerFiles: {}
configmapReload:
alertmanager:
enabled: true
image:
repository: {image-repo}/jimmidyson/configmap-reload
tag: v0.8.0
prometheus:
image:
repository: {image-repo}/jimmidyson/configmap-reload
tag: v0.8.0
kube-state-metrics:
image:
repository: {image-repo}/kube-state-metrics/kube-state-metrics
tag: v2.13.0
nodeSelector: *nodeSelector
tolerations: *tolerations
nodeExporter:
image:
repository: {image-repo}/prometheus/node-exporter
tag: v1.8.0
nodeSelector: *nodeSelector
tolerations: *tolerations
processExporter:
image:
repository: {image-repo}/ncabatoff/process-exporter
tag: sha-7ef0b73
nodeSelector: *nodeSelector
tolerations: *tolerations
pushgateway:
image:
repository: {image-repo}/prom/pushgateway
tag: v1.9.0
nodeSelector: *nodeSelector
tolerations: *tolerations
server:
image:
repository: {image-repo}/prometheus/prometheus
tag: v2.49.1
persistentVolume:
enabled: true
size: 100Gi
storageClass: {storageclass}
nodeSelector: *nodeSelector
tolerations: *tolerations
extraInitContainers:
- image: {image-repo}/library/busybox:1.31.1
name: prometheus-wal-cleanup
command:
- /bin/sh
- -c
- if [ $(du -sm /data/wal | cut -f1) -gt 1024 ]; then rm -rf /data/wal/*; fi
volumeMounts:
- mountPath: /data
name: storage-volume
serverFiles:
alerting_rules.yml:
groups:
- name: Host alerts
rules:
- alert: PVCUtilizationHigh
annotations:
message: The persistentVolume used by {{ $labels.persistentvolumeclaim
}} is {{ $value | humanize }}% utilized. Please check and take appropriate
action.
summary: PVC utilization on PVC {{ $labels.persistentvolumeclaim }} is
high
expr: 100 * sum(kubelet_volume_stats_used_bytes) by(persistentvolumeclaim)
/sum(kubelet_volume_stats_capacity_bytes) by (persistentvolumeclaim) >
90
for: 5m
labels:
severity: warning
- alert: HostOutOfDiskSpace
annotations:
description: |-
Disk is almost full (< 80% left)
VALUE = {{ $value | humanize }}
LABELS: {{ $labels }}
message: |-
Disk is almost full (< 80% left)
VALUE = {{ $value | humanize }}%
Node Name: {{ $labels.node }}
summary: Host out of disk space (instance {{ $labels.node }})
expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"})
> 80
for: 1s
labels:
severity: warningCreating Cloud Infrastructure
AWS
When installing on AWS, an IRSA service account is used. For details on IRSA, please see the AWS documentation.
Create the IAM role:
bash
aws iam create-role \
--role-name yb-eks-pod-fluent-bit-{instance-name}-{region} \
--assume-role-policy-document file://trust-policy.jsonThe trust policy:
json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "{oidc-provider-arn}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"{oidc-provider}:sub": "system:serviceaccount:{namespace}:yb-{namespace}-worker-sa"
}
}
}
]
}Create the IAM policy
bash
aws iam put-role-policy \
--role-name yb-eks-pod-fluent-bit-{instance-name}-{region} \
--policy-name diags-upload \
--policy-document file://iam-policy.jsonThe IAM policy:
json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": "arn:aws:s3:::{observability-storage}/*"
}
]
}To the values above, add these values in the fluent-bit block and include the ARN of the AWS IAM role in place of {role-arn}:
yaml
fluent-bit:
...
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: { role-arn }
...