Appearance
Drive and Filesystem Metrics
This page documents Prometheus metrics related to drives and filesystems across the Yellowbrick system. These metrics provide a unified view of both filesystem usage and drive health, covering information on mounted volumes, Yellowbrick data directories, and appliance drives.
Purpose
Disk and drive metrics are critical for maintaining the reliability and performance of the system. They are used to:
- Monitor disk space consumption across mounted volumes and Yellowbrick-specific data directories
- Track filesystem capacity and usage as reported by the operating system
- Detect abnormal growth due to unflushed data, log buildup, or other anomalies
- Assess the health of appliance drives and their filesystems
- Detect early signs of drive failure and track when drives enter a failure state
- Identify and monitor defects, wear, and errors in drives, along with how these errors are corrected
Together, these metrics enable proactive alerting, capacity planning, and detailed monitoring of the storage infrastructure.
Metrics
| Name | Type | Freq | Labels | Version Introduced | Version Deprecated | Description |
|---|---|---|---|---|---|---|
node_disk_usage_bytes | gauge | 10s | path | 7.4.0 | - | Disk usage in bytes per mount |
yb_drive_bytes_total | gauge | 1m | worker_uuid, drive_id | 7.4.0 | - | Total bytes on the drive |
yb_drive_bytes_used | gauge | 1m | worker_uuid, drive_id | 7.4.0 | - | Total bytes used on the drive |
yb_drive_defect_not_remediated | gauge | 1m | worker_uuid | 7.4.0 | - | Total number of drive defects that have not been remediated per worker |
yb_drive_defect_otf_corrected | gauge | 1m | worker_uuid | 7.4.0 | - | Total number of on-the-fly corrected drive defects (current and historical) per worker |
yb_drive_defect_total | gauge | 1m | worker_uuid | 7.4.0 | - | Total number of drive defects (current and historical) per worker |
yb_drive_failure | gauge | 1m | worker_uuid, drive_id | 7.4.0 | - | Reports if drive is in failure state (1 = failure, 0 = otherwise) |
yb_drive_wear | gauge | 1m | worker_uuid, drive_id | 7.4.0 | - | Current drive wear (0 to 100) |
yb_io_scanner_scan_rate_total | counter | 10s | cluster, worker_logical_id, worker_uuid | 7.4.0 | - | Current IO scanner rate |
yb_io_scanner_starts_total | counter | 10s | cluster, worker_logical_id, worker_uuid | 7.4.0 | - | Number of IO scans started |
yb_mount_avail_bytes | gauge | 5m | - | 7.4.0 | - | Available bytes to non-root as reported by df. |
yb_mount_used_bytes | gauge | 5m | - | 7.4.0 | - | Used bytes as reported by df. |