Appearance
Catalog Maintenance Manager
The Catalog Maintenance Manager helps maintain Yellowbrick system catalogs to keep them healthy and efficient — without relying on external scripts. It manages operations such as VACUUM and REINDEX through simple SQL commands.
Why Catalog Maintenance Matters
System catalogs grow as objects are created, modified, or dropped. Without regular maintenance, catalog tables can accumulate:
- Dead tuples (deleted rows that still occupy space)
- Invalidated indexes
These conditions can slow query planning and increase memory usage. In extreme cases, this can lead to the database going into read-only mode or not accepting new connections. The Catalog Maintenance Manager ensures that catalogs stay lean, accurate, and performant with minimal human intervention.
How It Works
Maintenance operations are coordinated through three components:
- SQL Control Layer — SQL commands that start, stop, and monitor maintenance operations.
- Health Evaluation Function — Identifies which tables or indexes require attention.
- Maintenance Service — Executes the maintenance actions in the background and reports progress.
When maintenance runs, it automatically:
- Pauses background services such as GC, Analyzer, and Flusher.
- Determines which objects need
VACUUMorREINDEX. - Runs those operations in parallel.
- Resumes all services once complete.
All of this happens under the control of a single SQL command.
Checking Catalog Maintenance Recommendations
Before running maintenance, you can preview what would be cleaned or rebuilt on a per database basis using the function:
sql
SELECT * FROM sys.catalog_maintenance_recommendations(target_db_oid [, top [, shared_only [, include_all]]]);Arguments
| Option | Description | Default |
|---|---|---|
target_db_oid | OID of the database you want recommendations. Required. Pass NULL to use the current database. | - |
top | Limit result to the top first recommendations, by order of priority. -1 shows all recommendations, without prioritization. | -1 |
shared_tables_only | Only show maintenance recommendations for shared catalog tables. | false |
include_all | Show maintenance recommendations for both shared and target database catalog tables. | false |
Note:
- Options
shared_tables_onlyandinclude_allcannot be used at the same time shared_tables_onlyandinclude_allcan only be used whentarget_db_oidis the oid of theyellowbrickdatabase (4400).
This function lists the top objects needing maintenance, along with the maintenance action recommended, why and some useful information:
sql
select * from sys.catalog_maintenance_recommendations(4400, 5);
dbname | schemaname | relname | relkind | relid | maintenance_action | reason | dead_tuples | live_tuples | relpages | size_mb | reclaimable_mb | last_vacuum |
-----------------+------------+------------------------------+---------+-------+--------------------+-----------------------------------+-------------+-------------+----------+---------+----------------+------------------------+
yellowbrick | pg_catalog | pg_proc | r | 1255 | VACUUM FULL | severe_bloat_requires_full_vacuum | 20000 | 0 | 20000 | 625 | 625 | 2000-01-01 00:00:00+00 |
yellowbrick | pg_catalog | pg_description | r | 2609 | VACUUM | dead_ratio_exceeds_threshold | 1000 | 1000 | 10000 | 312.5 | 156.25 | 2000-01-01 00:00:00+00 |
yellowbrick | pg_catalog | pg_constraint_contypid_index | i | 2666 | REINDEX | index_not_ready | 0 | 0 | 0 | 0 | 0 | 2000-01-01 00:00:00+00 |
yellowbrick | pg_catalog | pg_description_o_c_o_index | i | 2675 | REINDEX | invalid_index | 0 | 0 | 0 | 0 | 0 | 2000-01-01 00:00:00+00 |
(4 rows)Priority order
When ordering using top option, order follows the below prioritization:
VACUUM FULL— Severe bloatVACUUM— Dead tuples exceed thresholdREINDEX— Index invalid or not ready
Within each group:
- Sort by reclaimable_mb
Starting Maintenance
To begin an automated maintenance run, use:
sql
ALTER SYSTEM MAINTENANCE START [ WITH ( option [, option] )]Options
| Option | Description | Default |
|---|---|---|
max_duration | Maximum runtime for the maintenance. Valid units for this parameter are "ms", "s", "min", "h", and "d". | 1h |
pause_services | List of background services to pause while maintenance is running. | gc,flusher,analyzer,autovacuum |
parallelism | Number of databases to process in parallel. | 4 |
retries | Retry count for transient errors (locks, timeouts) per action. | 3 |
shared_tables_only | Only perform maintenance of shared catalog tables. | false |
include_all | Perform maintenance of both shared and per-database catalog tables. | false |
top | Number of highest-priority objects per database to process. | 100 |
dry_run | When true, reports what would be done without executing. | false |
verbose | Enables logging of each step in the Lime logs. | false |
Maintenance will pause the specified services and run until either all work is complete, the duration limit is reached, or it is stopped manually. Once done, all services are automatically set back to their previous state.
Examples
sql
-- Use all default values
ALTER SYSTEM MAINTENANCE START;
-- Override some options
ALTER SYSTEM MAINTENANCE START
WITH (
max_duration = '15min',
parallelism = 4,
retries = 3,
shared_tables_only = true,
include_all = false,
top = 5,
dry_run
);Stopping Maintenance
To stop a maintenance run currently in progress in the background:
sql
ALTER SYSTEM MAINTENANCE STOP;This command interrupts the current run, ensures all paused services are resumed, and finalizes progress reporting.
Monitoring Progress
Using status function
While maintenance is running — or after it finishes — check the status:
sql
SELECT * FROM sys.catalog_maintenance_status();This will return the status of the last recorded maintenance run on the system, as a JSON object.
The output includes all actions taken in each database and its status (ok or failed).
The overall status of the maintenance run is given in the state field and can be one of :
idle: No run recordedstarting: Starting up a new maintenance runrunning: Maintenance in progresscomplete: Maintenance is done, with no failuresfailed: Maintenance completed with one or more failurestimeout: Maintenance timed out and stoppedstopping: Maintenance is in the middle of stopping, afterALTER SYSTEM MAINTENANCE STOPstopped: Maintenance was stopped usingALTER SYSTEM MAINTENANCE STOP
Note: If Lime is restarted, the status record from current or past maintenance run is lost.
Example:
json
yellowbrick_test=# select sys.catalog_maintenance_status();
catalog_maintenance_status
-------------------------------------------------------------------------------------------------------------------------------------
{ +
"state" : "complete", +
"startedAt" : "2025-11-04T10:25:34.811554461Z", +
"endedAt" : "2025-11-04T10:25:35.062168162Z", +
"elapsed" : 0, +
"totalDbs" : 9, +
"totalRelations" : 6, +
"processed" : 6, +
"failed" : 0, +
"lastAction" : "a: REINDEX pg_catalog.pg_description_o_c_o_index", +
"lastError" : null, +
"opts" : { +
"sharedTablesOnly" : false, +
"includeAllTables" : true, +
"topN" : 100, +
"dryRun" : false, +
"retries" : 3, +
"pauseServices" : "gc,flusher,analyzer,autovacuum", +
"maxDuration" : "3600s", +
"verbose" : true, +
"parallelism" : 4 +
}, +
"databaseActions" : { +
"a" : [ +
{"schema":"pg_catalog","relname":"pg_proc","action":"VACUUM FULL","result":"ok","elapsedMs":67,"error":null}, +
{"schema":"pg_catalog","relname":"pg_constraint_contypid_index","action":"REINDEX","result":"ok","elapsedMs":6,"error":null},+
{"schema":"pg_catalog","relname":"pg_description_o_c_o_index","action":"REINDEX","result":"ok","elapsedMs":10,"error":null} +
], +
"yellowbrick_test" : [ +
{"schema":"pg_catalog","relname":"pg_proc","action":"VACUUM FULL","result":"ok","elapsedMs":68,"error":null}, +
{"schema":"pg_catalog","relname":"pg_constraint_contypid_index","action":"REINDEX","result":"ok","elapsedMs":6,"error":null},+
{"schema":"pg_catalog","relname":"pg_description_o_c_o_index","action":"REINDEX","result":"ok","elapsedMs":11,"error":null} +
] +
}, +
"dbSummaries" : { +
"a" : {"processed":3,"failed":0,"reclaimedMb":535.0340281641912,"state":"complete","error":null}, +
"yellowbrick" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null}, +
"_YBPG_stats" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null}, +
"yellowbrick_test" : {"processed":3,"failed":0,"reclaimedMb":591.8280384451493,"state":"complete","error":null}, +
"yellowbrick_test_utf8" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null}, +
"yellowbrick_unit_tests" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null}, +
"maint_test" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null}, +
"glinda_test_utf8" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null}, +
"glinda_test_latin9" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null} +
} +
}
(1 row)From the Lime logs
When running in verbose mode, the maintenance operation will log each operation it runs in the Lime logs.
It will also log a small progress status every 15 seconds with the total number of relations to be processed, how many are done / failed and the overall status of the maintenance run.
text
Progress: 3/15 processed, 0 failed (state=running)Configuration Parameters
Use ALTER SYSTEM to tune global maintenance thresholds and behavior:
| Parameter | Default | Description |
|---|---|---|
| yb_maintenance_dead_tuple_ratio_threshold | 0.10 | Dead/live tuple ratio that triggers a VACUUM recommendation. |
| yb_maintenance_min_table_mb | 8.0 | Minimum table size considered for maintenance recommendations. |
| yb_maintenance_include_indexes | true | Include indexes in recommendations. |
| yb_maintenance_max_duration_s | 3600s (1 hour) | Maximum runtime for a maintenance run. |
| yb_maintenance_parallelism | 4 | Number of databases to process in parallel during maintenance. |
| yb_maintenance_retry_attempts | 3 | Number of retry attempts per operation during maintenance. |
| yb_maintenance_pause_services | "gc,flusher,analyzer, autovacuum" | Background services to pause during maintenance. |
Note: To make these settings apply to Lime’s maintenance sessions, use ALTER SYSTEM (not SET) so that they persist across processes.
Best Practices
- Run catalog maintenance during low workload periods to avoid lock conflicts and impacting regular workloads.
- Use
dry_run = trueto preview scope before running a full session. - Keep
yb_maintenance_parallelismmodest (2–4) on smaller instances. - Periodically review
sys.catalog_maintenance_status()for failures or skipped objects.
Troubleshooting
| Symptom | Likely Cause | Resolution |
|---|---|---|
Session stops early with status timeout | max_duration reached | Increase duration limit. |
| Some database actions are marked as failed | Locks or transient errors | Adjust retries or rerun maintenance in isolation. |
Given max_duration parameter not applied during maintenance | Invalid unit was given | Make sure the max_duration argument unit is one of "ms", "s", "min", "h", or "d" |
| Configuration variable settings not applied during maintenance | Session configuration variable settings not visible to Lime | Use ALTER SYSTEM to persist them. |