Skip to content

Alerting

Yellowbrick supports integration with Slack and Opsgenie for alerting. When unexpected issues are detected, an alert will be sent to one or both of these channels.

Configuration of alerting endpoints is accomplished through usage of kubectl commands. In a subsequent release a friendly user interface will be added to Yellowbrick Manager.

Yellowbrick currently alerts on the following unexpected exceptional conditions:

  • Disc space low or exhausted
  • Unexpected crashes or process exits
  • Issues with background tasks
  • File system consistency issues
  • Quota exhaustion
  • Row store volume exhaustion

The workload manager is also capable of generating rule-based custom alerts in the event of conditions such as queries running too long, users hogging resources or similar. Such alerts will also be dispatched through this mechanism. See the workload manager rule actions for more information.

Step 1: Collect Integration Information

Alerts can be sent to Slack, Opsgenie, or both tools. To configure alerting for Slack, you need to know the URL and channel name. For Opsgenie, you need an API key and, optionally, an API URL.

To find your slack URL, follow the instructions here. Make sure that a target Slack channel (beginning with a #) has been created in advance. To create an Opsgenie API key, and work out which URL is pertinent, follow the instructions here.

Step 2: Create a JSON Configuration File

To configure alerting, either the slack configuration, the Opsgenie configuration or both must be specified in a JSON document and uploaded to a Kubernetes secret. To do so, create a document called alert.json as follows:

bash
echo '{
  "slackChannel": <slackChannelName>,
  "slackUrl": <slackURL>,
  "opsGenieKey": <opsGenieKey>
  "opsGenieUrl": <opsGenieURL>
}' > alert.json

For Slack configuration, both slackChannel and slackURL must be specified. The channel must be prefixed by a # character and must be created in advance.

The opsGenieUrl is an optional parameter, defaulting to the global https://api.opsgenie.com if omitted.

A fully formed example JSON file might look something like:

bash
echo '{
  "slackChannel": "#alerts",
  "slackUrl": "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX",
  "opsGenieKey": "12345-abcde-67890-fghij-12345"
}' > alert.json

Step 3: Install the JSON Configuration File

The JSON document must be installed into a Kubernetes secret called yb-monitoring-secret. To do so, use the following kubectl command:

bash
kubectl create secret generic yb-monitoring-secret --from-file=state=alert.json -n monitoring

The JSON document must be well formed, and the endpoints specified corrently, with egress permitted if running in a fully private configuration.

Step 4: Generate a Test Alert

Alerts are generated off log messages. The easiest way to generate a test alert is to inject a fake log message. Here is an example of injecting a crash error into the PostgreSQL log:

bash
kubectl exec -it ybinst-<instance-name>-0 -n <namespace> -c ybinst-pg -- bash -c "echo 'ALERT: PG exited' > /proc/1/fd/1"

Diagnosing Problems

In the case of a malformed JSON document, or missing or malformed keys in the document, errors will be posted to the Yellowbrick Operator logs. To inspect the Operator logs, use the following command:

bash
kubectl logs -l app=yb-operator -n <operator_namespace> -f | grep yb-monitoring-secret

An example of an error due to malformed JSON looks something like this:

txt
2024-01-02T03:04:05Z    ERROR  Secret.Monitoring        Invalid alterting configuration: unable to deserialize the configuration, it might not be a valid json      {"namespace": "monitoring", "name": "yb-monitoring-secret", "error": "unexpected end of JSON input"}

In the case of issues sending an alert, errors will be posted to the Alertmanager logs. To inspect the Alertmanager logs, first find the pod name and then retrieve the logs as follows:

bash
kubectl get pod -l component=alertmanager -n monitoring
kubectl logs <pod_name_from_above_command> -n monitoring -f prometheus-alertmanager

An example of errors sending alerts to Slack and Opsgenie respectively look something like this:

txt
ts=2024-08-22T03:43:18.613Z caller=dispatch.go:353 level=error component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="default-slack-receiver/slack[0]: notify retry canceled due to unrecoverable error after 1 attempts: channel \"#alerts\": unexpected status code 404: no_team"
ts=2024-08-22T03:43:18.692Z caller=dispatch.go:353 level=error component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="team-pager/opsgenie[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 422: {\"message\":\"Key format is not valid!\",\"took\":0.001,\"requestId\":\"d36de5df-7e94-40cb-b09a-d274e14aad48\"}"

Disabling Alerting

To completely disable alerting, you can just delete the secret. To do so, use the following command:

bash
kubectl delete secret yb-monitoring-secret -n monitoring