Appearance
Alert Types
This section describes the specific system-defined and user-defined alerts you can configure on your cluster.
Alerts return two messages reflecting an OPEN
status and a CLOSED
status for the event that triggered the alert. OPEN
alerts are considered active until CLOSED
. An alert is closed when the alert event is resolved, such as when the database or a compute blade comes back online. Test alerts also have OPEN
and CLOSED
messages, which follow each other within about 10 seconds.
Alert messages provide specific details about the error conditions that triggered the alert. For example, for a Database State alert you may see one of the following messages:
The database stopped running: too many missing compute nodes
The database is degraded due to: rebuilding. Performance may be affected.
System-Defined Alerts
The following system-defined alerts are supported. (You cannot create new alert types.)
Alert Name | Rule | Resource ID in Alert Messages |
---|---|---|
Cluster Quiesce | Alert when the database is quiesced or an attempt to quiesce it fails. When the database is quiesced, active queries are cancelled. Queries that were queued start running when the database comes back online. | database:event |
Compute Blade | Alert when a compute blade changes state. For example, a blade may be powered off, causing an alert. | Chassis number and blade number. For example: chassis0:blade10 |
Compute Blade Reset | Alert when a compute blade restarts. | Chassis number and blade number. For example:chassis2:blade14 |
Database State | Alert when the database changes state. For example, the database may be degraded because a compute node is offline. | database:state |
Database Row Store | Alert when the database row store changes state. | database:rowstore |
Fan | Alert when a fan changes state. For example, a fan may have failed or have been removed from the appliance. | Chassis number and fan number. For example: chassis0:fan2 |
LDAP | Alert when LDAP synchronization fails. | database:LDAPSynchronizer |
Manager Node Drive Not Detected | Alert when a manager node drive is not detected. For example, a specific drive may not be installed. | Manager node number and drive ID. For example: manager1:drive:nvme4n1 |
Manager Node HA State | Alert when a manager node changes state. For example, one of the manager nodes may be offline, and failover is temporarily not supported. | database:ha_state |
Network Switch | Alert when a network switch changes state. | Chassis number and switch number. For example: chassis0:switch2 |
Power Supply | Alert when a power supply changes state. | Chassis number and power supply number. For example: chassis0:power2 |
Temperature | Alert when the inlet temperature for the system exceeds 35C. | database:temperature |
Test | Alert when Test Alert is requested by the SMC user. | database:test |
Test Alerts
Test alerts are system-defined, but you can trigger them in two different ways:
- After finishing the creation of a new endpoint, you can send a test alert for that specific endpoint. In this case, the endpoint may be enabled or disabled. Click Test Alert within the summary screen for the endpoint.
- You can send an alert to all enabled endpoints via Configure > Alerting > Test Alert). Disabled endpoints will not receive the alert.
User-Defined Alerts
The following user-defined alerts are supported. By default, they are all disabled. You can enable all of them or any subset.
You cannot create new alert types.
The alerts with numeric thresholds have default values for Major
and Critical
severity alerts. You can define additional thresholds for Informational
and Minor
severity alerts.
Alert Name | Rule | Resource ID in Messages |
---|---|---|
Backup Chain Age | Alert when there are backup chains older than a configurable threshold (controlled by configuration parameter old_chain_threshold_days ).- Default threshold: 30 days | database:old_backup_chains |
Compute Blade Disk Used | Alert when compute blade disk usage exceeds the specified percentage. One alert is triggered per cluster, when any one drive exceeds the threshold. Default thresholds: - Major severity when value is greater than 85 - Critical severity when value is greater than 95 | Chassis number, blade number, drive number, then usage . For example:chassis0:blade9:drive3:usage |
Compute Blade Disk Wear | Alert when compute blade disk wear exceeds the specified percentage. Default thresholds: - Major severity when value is greater than 85 - Critical severity when value is greater than 95 | Chassis number, blade number, drive number, then wear . For example:chassis0:blade9:drive3:wear |
Database Connections Used | Alert when the number of database connections exceeds the specified percentage. Default thresholds: - Major severity when value is greater than 85 - Critical severity when value is greater than 95 | database:connections |
Encryption Keystore | Alert when the encryption keystore is locked. | mgmt0:vault or mgmt1:vault , depending on which manager node is the primary. |
Manager Node Disk Wear | Alert when manager node disk wear exceeds the specified percentage. Default thresholds: - Major severity when value is greater than 85 - Critical severity when value is greater than 95 | Manager node number, drive name, then wear . For example: manager2:mgmt2-/dev/nvme2n1:wear |
Network Status (External) | Alert when the external network status changes. | manager#:external_bond |
WLM Rule | Alert when a WLM rule is triggered with the action Log ERROR or Log WARN . See Rule Actions. WLM alerts are based on workload management rules rather than alerting rules. The message for a WLM alert contains the query ID that triggered the WLM rule in parentheses. | For example:database:wlm:SELECT * rule where SELECT * rule is the name of a WLM rule that was triggered and in turn triggered the alert. |
Yellowbrick Row Store Data Files | Alert when the number of data files exceeds one of the configurable thresholds. - Minor severity when the number exceeds the value of configuration parameter yrs_data_files_count_minor_threshold - Major when the number exceeds the value of yrs_data_files_count_major_threshold - Critical when the number exceeds the value of yrs_data_files_count_critical_threshold | database:yrs |
Yellowbrick Row Store Commit Records | Alert when the number of commit records exceeds one of the configurable thresholds. - Minor severity when the number exceeds the value of configuration parameter yrs_commit_records_count_minor_threshold - Major when the number exceeds the value of yrs_commit_records_count_major_threshold - Critical when the number exceeds the value of yrs_commit_records_count_critical_threshold | database:yrs |
Yellowbrick Row Store Delete Records | Alert when the number of delete records exceeds one of the configurable thresholds. - Minor severity when the number exceeds the value of configuration parameter yrs_delete_records_count_minor_threshold - Major when the number exceeds the value of yrs_delete_records_count_major_threshold - Critical when the number exceeds the value of yrs_delete_records_count_critical_threshold | database:yrs |
Yellowbrick Row Store Unused Files | Alert when the number of unused files exceeds one of the configurable thresholds. - Minor severity when the number exceeds the value of configuration parameter yrs_unused_files_count_minor_threshold - Major when the number exceeds the value of yrs_unused_files_count_major_threshold - Critical when the number exceeds the value of yrs_unused_files_count_critical_threshold | database:yrs |
Query Alerts
To see active and logged query alerts, go to Manage > Query Alerts. WLM alerts appear under Query Alerts by default; if they are enabled in the Configure Alerting screen, WLM alerts also appear under Cluster Alerts.