Skip to content

Alert Types

This section describes the specific system-defined and user-defined alerts you can configure on your cluster.

Alerts return two messages reflecting an OPEN status and a CLOSED status for the event that triggered the alert. OPEN alerts are considered active until CLOSED. An alert is closed when the alert event is resolved, such as when the database or a compute blade comes back online. Test alerts also have OPEN and CLOSED messages, which follow each other within about 10 seconds.

Alert messages provide specific details about the error conditions that triggered the alert. For example, for a Database State alert you may see one of the following messages:

The database stopped running: too many missing compute nodes
The database is degraded due to: rebuilding. Performance may be affected.

System-Defined Alerts

The following system-defined alerts are supported. (You cannot create new alert types.)

Alert NameRuleResource ID in Alert Messages
Cluster QuiesceAlert when the database is quiesced or an attempt to quiesce it fails. When the database is quiesced, active queries are cancelled. Queries that were queued start running when the database comes back online.
database:event


Compute BladeAlert when a compute blade changes state. For example, a blade may be powered off, causing an alert.Chassis number and blade number. For example:
chassis0:blade10


Compute Blade ResetAlert when a compute blade restarts.Chassis number and blade number. For example:
chassis2:blade14


Database StateAlert when the database changes state. For example, the database may be degraded because a compute node is offline.
database:state


Database Row StoreAlert when the database row store changes state.
database:rowstore


FanAlert when a fan changes state. For example, a fan may have failed or have been removed from the appliance.Chassis number and fan number. For example:
chassis0:fan2


LDAPAlert when LDAP synchronization fails.
database:LDAPSynchronizer


Manager Node Drive Not DetectedAlert when a manager node drive is not detected. For example, a specific drive may not be installed.Manager node number and drive ID. For example:
manager1:drive:nvme4n1


Manager Node HA StateAlert when a manager node changes state. For example, one of the manager nodes may be offline, and failover is temporarily not supported.
database:ha_state


Network SwitchAlert when a network switch changes state.Chassis number and switch number. For example:
chassis0:switch2


Power SupplyAlert when a power supply changes state.Chassis number and power supply number. For example:
chassis0:power2


TemperatureAlert when the inlet temperature for the system exceeds 35C.
database:temperature


TestAlert when Test Alert is requested by the SMC user.
database:test


Test Alerts

Test alerts are system-defined, but you can trigger them in two different ways:

  • After finishing the creation of a new endpoint, you can send a test alert for that specific endpoint. In this case, the endpoint may be enabled or disabled. Click Test Alert within the summary screen for the endpoint.
  • You can send an alert to all enabled endpoints via Configure > Alerting > Test Alert). Disabled endpoints will not receive the alert.

User-Defined Alerts

The following user-defined alerts are supported. By default, they are all disabled. You can enable all of them or any subset.

You cannot create new alert types.

The alerts with numeric thresholds have default values for Major and Critical severity alerts. You can define additional thresholds for Informational and Minor severity alerts.

Alert NameRuleResource ID in Messages
Backup Chain AgeAlert when there are backup chains older than a configurable threshold (controlled by configuration parameter old_chain_threshold_days).
- Default threshold: 30 days

database:old_backup_chains
Compute Blade Disk UsedAlert when compute blade disk usage exceeds the specified percentage. One alert is triggered per cluster, when any one drive exceeds the threshold. Default thresholds: - Major severity when value is greater than 85
- Critical severity when value is greater than 95

Chassis number, blade number, drive number, then usage. For example:
chassis0:blade9:drive3:usage


Compute Blade Disk WearAlert when compute blade disk wear exceeds the specified percentage. Default thresholds: - Major severity when value is greater than 85
- Critical severity when value is greater than 95

Chassis number, blade number, drive number, then wear. For example:
chassis0:blade9:drive3:wear


Database Connections UsedAlert when the number of database connections exceeds the specified percentage. Default thresholds: - Major severity when value is greater than 85
- Critical severity when value is greater than 95


database:connections


Encryption KeystoreAlert when the encryption keystore is locked.mgmt0:vault or mgmt1:vault, depending on which manager node is the primary.
Manager Node Disk WearAlert when manager node disk wear exceeds the specified percentage. Default thresholds: - Major severity when value is greater than 85
- Critical severity when value is greater than 95

Manager node number, drive name, then wear. For example:
manager2:mgmt2-/dev/nvme2n1:wear


Network Status (External)Alert when the external network status changes.
manager#:external_bond


WLM RuleAlert when a WLM rule is triggered with the action Log ERROR or Log WARN. See Rule Actions. WLM alerts are based on workload management rules rather than alerting rules. The message for a WLM alert contains the query ID that triggered the WLM rule in parentheses.For example:
database:wlm:SELECT * rule


where SELECT * rule is the name of a WLM rule that was triggered and in turn triggered the alert.
Yellowbrick Row Store Data FilesAlert when the number of data files exceeds one of the configurable thresholds.
- Minor severity when the number exceeds the value of configuration parameter yrs_data_files_count_minor_threshold
- Major when the number exceeds the value of yrs_data_files_count_major_threshold
- Critical when the number exceeds the value of yrs_data_files_count_critical_threshold
database:yrs
Yellowbrick Row Store Commit RecordsAlert when the number of commit records exceeds one of the configurable thresholds.
- Minor severity when the number exceeds the value of configuration parameter yrs_commit_records_count_minor_threshold
- Major when the number exceeds the value of yrs_commit_records_count_major_threshold
- Critical when the number exceeds the value of yrs_commit_records_count_critical_threshold
database:yrs
Yellowbrick Row Store Delete RecordsAlert when the number of delete records exceeds one of the configurable thresholds.
- Minor severity when the number exceeds the value of configuration parameter yrs_delete_records_count_minor_threshold
- Major when the number exceeds the value of yrs_delete_records_count_major_threshold
- Critical when the number exceeds the value of yrs_delete_records_count_critical_threshold
database:yrs
Yellowbrick Row Store Unused FilesAlert when the number of unused files exceeds one of the configurable thresholds.
- Minor severity when the number exceeds the value of configuration parameter yrs_unused_files_count_minor_threshold
- Major when the number exceeds the value of yrs_unused_files_count_major_threshold
- Critical when the number exceeds the value of yrs_unused_files_count_critical_threshold
database:yrs

Query Alerts

To see active and logged query alerts, go to Manage > Query Alerts. WLM alerts appear under Query Alerts by default; if they are enabled in the Configure Alerting screen, WLM alerts also appear under Cluster Alerts.