Skip to content

Compute Cluster Metrics

This page documents Prometheus metrics emitted by the Compute Cluster component, which manages distributed query execution across compute nodes in a Yellowbrick deployment.

Purpose

These metrics are used to monitor the health, performance, and resource usage of compute clusters. They provide visibility into:

  • Query activity and throughput
  • Memory and CPU utilization
  • Query compilation and execution times
  • Cluster degradation or compute node loss

They are critical for performance tuning, detecting resource contention, and ensuring cluster reliability at scale.

Metrics

NameTypeFreqLabelsDescription
yb_lime_active_queriesgauge10scluster, cluster_name, poolNumber of active queries
yb_lime_cluster_degraded_statusgauge10scluster, cluster_name, status, reasonCluster degraded status
yb_lime_cluster_missing_workersgauge10scluster, cluster_nameNumber of missing workers in each cluster
yb_lime_cluster_stategauge10scluster, state, reasonCluster state
yb_lime_queries_completed_backend_totalcounter10scluster, cluster_name, pool, stateTotal number of queries completed via backend
yb_lime_queries_completed_totalcounter10scluster, cluster_name, stateTotal number of queries completed
yb_lime_queries_submitted_totalcounter10scluster, cluster_nameTotal number of queries submitted
yb_lime_query_bytes_networkhistogram10scluster, cluster_nameBytes network
yb_lime_query_bytes_readhistogram10scluster, cluster_nameBytes read
yb_lime_query_bytes_read_spillhistogram10scluster, cluster_nameBytes read spill
yb_lime_query_bytes_writtenhistogram10scluster, cluster_nameBytes written
yb_lime_query_bytes_written_spillhistogram10scluster, cluster_nameBytes written spill
yb_lime_query_cache_efficiencyhistogram10scluster, cluster_nameQuery cache efficiency
yb_lime_query_compile_timehistogram10scluster, cluster_nameQuery compile time in seconds
yb_lime_query_cpu_usagegauge10scluster, cluster_name, poolDuration weighted average of longest worker CPU query usage as percentage of allocated CPU
yb_lime_query_lock_timehistogram10scluster, cluster_nameQuery lock time in seconds
yb_lime_query_memory_grantedgauge10scluster, cluster_name, poolTotal memory granted to queries in bytes
yb_lime_query_memory_usedgauge10scluster, cluster_name, poolTotal memory used by queries in bytes
yb_lime_query_run_timehistogram10scluster, cluster_nameQuery run time in seconds
yb_lime_query_total_timehistogram10scluster, cluster_nameQuery total time in seconds
yb_lime_query_wait_timehistogram10scluster, cluster_nameQuery wait time in seconds