Appearance
Observability Overview
This section provides a comprehensive reference for Prometheus metrics and alerting rules used to monitor system health, performance, and reliability across cloud platform components.
Metrics
The Metrics Documentation lists all Prometheus metrics emitted by various components. Each metric entry includes its type, collection frequency, labels, and a description. This is useful for:
- Building dashboards
- Analyzing component behavior
- Understanding what instrumentation is available
Alerts
The Alerts Documentation describes all alert rules configured in our Prometheus setup. Alerts are grouped by component and include severity levels, trigger conditions, and human-readable descriptions.
Use this to:
- Understand why an alert fired
- Debug active incidents
- Tune alert sensitivity or thresholds
Threshold Reference
Some alerts reference templated threshold values from our Helm charts. These are documented separately in the Threshold Reference page.