Appearance
Alerting
Yellowbrick supports integration with Slack and Opsgenie for alerting. When unexpected issues are detected, an alert will be sent to one or both of these channels.
Configuration of alerting endpoints is accomplished through usage of kubectl
commands. In a subsequent release a friendly user interface will be added to Yellowbrick Manager.
Yellowbrick currently alerts on the following unexpected exceptional conditions:
- Disc space low or exhausted
- Unexpected crashes or process exits
- Issues with background tasks
- File system consistency issues
- Quota exhaustion
- Row store volume exhaustion
The workload manager is also capable of generating rule-based custom alerts in the event of conditions such as queries running too long, users hogging resources or similar. Such alerts will also be dispatched through this mechanism. See the workload manager rule actions for more information.
Step 1: Collect Integration Information
Alerts can be sent to Slack, Opsgenie, or both tools. To configure alerting for Slack, you need to know the URL and channel name. For Opsgenie, you need an API key and, optionally, an API URL.
To find your slack URL, follow the instructions here. Make sure that a target Slack channel (beginning with a #
) has been created in advance. To create an Opsgenie API key, and work out which URL is pertinent, follow the instructions here.
Step 2: Create a JSON Configuration File
To configure alerting, either the slack configuration, the Opsgenie configuration or both must be specified in a JSON document and uploaded to a Kubernetes secret. To do so, create a document called alert.json as follows:
bash
echo '{
"slackChannel": <slackChannelName>,
"slackUrl": <slackURL>,
"opsGenieKey": <opsGenieKey>
"opsGenieUrl": <opsGenieURL>
}' > alert.json
For Slack configuration, both slackChannel
and slackURL
must be specified. The channel must be prefixed by a #
character and must be created in advance.
The opsGenieUrl
is an optional parameter, defaulting to the global https://api.opsgenie.com
if omitted.
A fully formed example JSON file might look something like:
bash
echo '{
"slackChannel": "#alerts",
"slackUrl": "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX",
"opsGenieKey": "12345-abcde-67890-fghij-12345"
}' > alert.json
Step 3: Install the JSON Configuration File
The JSON document must be installed into a Kubernetes secret called yb-monitoring-secret
. To do so, use the following kubectl
command:
bash
kubectl create secret generic yb-monitoring-secret --from-file=state=alert.json -n monitoring
The JSON document must be well formed, and the endpoints specified corrently, with egress permitted if running in a fully private configuration.
Step 4: Generate a Test Alert
Alerts are generated off log messages. The easiest way to generate a test alert is to inject a fake log message. Here is an example of injecting a crash error into the PostgreSQL log:
bash
kubectl exec -it ybinst-<instance-name>-0 -n <namespace> -c ybinst-pg -- bash -c "echo 'ALERT: PG exited' > /proc/1/fd/1"
Diagnosing Problems
In the case of a malformed JSON document, or missing or malformed keys in the document, errors will be posted to the Yellowbrick Operator logs. To inspect the Operator logs, use the following command:
bash
kubectl logs -l app=yb-operator -n <operator_namespace> -f | grep yb-monitoring-secret
An example of an error due to malformed JSON looks something like this:
txt
2024-01-02T03:04:05Z ERROR Secret.Monitoring Invalid alterting configuration: unable to deserialize the configuration, it might not be a valid json {"namespace": "monitoring", "name": "yb-monitoring-secret", "error": "unexpected end of JSON input"}
In the case of issues sending an alert, errors will be posted to the Alertmanager logs. To inspect the Alertmanager logs, first find the pod name and then retrieve the logs as follows:
bash
kubectl get pod -l component=alertmanager -n monitoring
kubectl logs <pod_name_from_above_command> -n monitoring -f prometheus-alertmanager
An example of errors sending alerts to Slack and Opsgenie respectively look something like this:
txt
ts=2024-08-22T03:43:18.613Z caller=dispatch.go:353 level=error component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="default-slack-receiver/slack[0]: notify retry canceled due to unrecoverable error after 1 attempts: channel \"#alerts\": unexpected status code 404: no_team"
ts=2024-08-22T03:43:18.692Z caller=dispatch.go:353 level=error component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="team-pager/opsgenie[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 422: {\"message\":\"Key format is not valid!\",\"took\":0.001,\"requestId\":\"d36de5df-7e94-40cb-b09a-d274e14aad48\"}"
Disabling Alerting
To completely disable alerting, you can just delete the secret. To do so, use the following command:
bash
kubectl delete secret yb-monitoring-secret -n monitoring