Skip to content

Catalog Maintenance Manager

The Catalog Maintenance Manager helps maintain Yellowbrick system catalogs to keep them healthy and efficient — without relying on external scripts. It manages operations such as VACUUM and REINDEX through simple SQL commands.

Why Catalog Maintenance Matters

System catalogs grow as objects are created, modified, or dropped. Without regular maintenance, catalog tables can accumulate:

  • Dead tuples (deleted rows that still occupy space)
  • Invalidated indexes

These conditions can slow query planning and increase memory usage. In extreme cases, this can lead to the database going into read-only mode or not accepting new connections. The Catalog Maintenance Manager ensures that catalogs stay lean, accurate, and performant with minimal human intervention.

How It Works

Maintenance operations are coordinated through three components:

  1. SQL Control Layer — SQL commands that start, stop, and monitor maintenance operations.
  2. Health Evaluation Function — Identifies which tables or indexes require attention.
  3. Maintenance Service — Executes the maintenance actions in the background and reports progress.

When maintenance runs, it automatically:

  • Pauses background services such as GC, Analyzer, and Flusher.
  • Determines which objects need VACUUM or REINDEX.
  • Runs those operations in parallel.
  • Resumes all services once complete.

All of this happens under the control of a single SQL command.

Checking Catalog Maintenance Recommendations

Before running maintenance, you can preview what would be cleaned or rebuilt on a per database basis using the function:

sql
SELECT * FROM sys.catalog_maintenance_recommendations(target_db_oid [, top [, shared_only [, include_all]]]);

Arguments

OptionDescriptionDefault
target_db_oidOID of the database you want recommendations. Required. Pass NULL to use the current database.-
topLimit result to the top first recommendations, by order of priority. -1 shows all recommendations, without prioritization.-1
shared_tables_onlyOnly show maintenance recommendations for shared catalog tables.false
include_allShow maintenance recommendations for both shared and target database catalog tables.false

Note:

  • Options shared_tables_only and include_all cannot be used at the same time
  • shared_tables_only and include_all can only be used when target_db_oid is the oid of the yellowbrick database (4400).

This function lists the top objects needing maintenance, along with the maintenance action recommended, why and some useful information:

sql
select * from sys.catalog_maintenance_recommendations(4400, 5);
 dbname          | schemaname |           relname            | relkind | relid | maintenance_action |              reason               | dead_tuples | live_tuples | relpages | size_mb | reclaimable_mb |      last_vacuum       |
-----------------+------------+------------------------------+---------+-------+--------------------+-----------------------------------+-------------+-------------+----------+---------+----------------+------------------------+
 yellowbrick     | pg_catalog | pg_proc                      | r       |  1255 | VACUUM FULL        | severe_bloat_requires_full_vacuum |       20000 |           0 |    20000 |     625 |            625 | 2000-01-01 00:00:00+00 |
 yellowbrick     | pg_catalog | pg_description               | r       |  2609 | VACUUM             | dead_ratio_exceeds_threshold      |        1000 |        1000 |    10000 |   312.5 |         156.25 | 2000-01-01 00:00:00+00 |
 yellowbrick     | pg_catalog | pg_constraint_contypid_index | i       |  2666 | REINDEX            | index_not_ready                   |           0 |           0 |        0 |       0 |              0 | 2000-01-01 00:00:00+00 |
 yellowbrick     | pg_catalog | pg_description_o_c_o_index   | i       |  2675 | REINDEX            | invalid_index                     |           0 |           0 |        0 |       0 |              0 | 2000-01-01 00:00:00+00 |
(4 rows)

Priority order

When ordering using top option, order follows the below prioritization:

  • VACUUM FULL — Severe bloat
  • VACUUM — Dead tuples exceed threshold
  • REINDEX — Index invalid or not ready

Within each group:

  • Sort by reclaimable_mb

Starting Maintenance

To begin an automated maintenance run, use:

sql
ALTER SYSTEM MAINTENANCE START [ WITH ( option [, option] )]

Options

OptionDescriptionDefault
max_durationMaximum runtime for the maintenance. Valid units for this parameter are "ms", "s", "min", "h", and "d".1h
pause_servicesList of background services to pause while maintenance is running.gc,flusher,analyzer,autovacuum
parallelismNumber of databases to process in parallel.4
retriesRetry count for transient errors (locks, timeouts) per action.3
shared_tables_onlyOnly perform maintenance of shared catalog tables.false
include_allPerform maintenance of both shared and per-database catalog tables.false
topNumber of highest-priority objects per database to process.100
dry_runWhen true, reports what would be done without executing.false
verboseEnables logging of each step in the Lime logs.false

Maintenance will pause the specified services and run until either all work is complete, the duration limit is reached, or it is stopped manually. Once done, all services are automatically set back to their previous state.

Examples

sql
-- Use all default values
ALTER SYSTEM MAINTENANCE START;

-- Override some options
ALTER SYSTEM MAINTENANCE START
WITH (
    max_duration = '15min',
    parallelism = 4,
    retries = 3,
    shared_tables_only = true,
    include_all = false,
    top = 5,
    dry_run
);

Stopping Maintenance

To stop a maintenance run currently in progress in the background:

sql
ALTER SYSTEM MAINTENANCE STOP;

This command interrupts the current run, ensures all paused services are resumed, and finalizes progress reporting.

Monitoring Progress

Using status function

While maintenance is running — or after it finishes — check the status:

sql
SELECT * FROM sys.catalog_maintenance_status();

This will return the status of the last recorded maintenance run on the system, as a JSON object.

The output includes all actions taken in each database and its status (ok or failed).

The overall status of the maintenance run is given in the state field and can be one of :

  • idle : No run recorded
  • starting: Starting up a new maintenance run
  • running: Maintenance in progress
  • complete: Maintenance is done, with no failures
  • failed: Maintenance completed with one or more failures
  • timeout: Maintenance timed out and stopped
  • stopping: Maintenance is in the middle of stopping, after ALTER SYSTEM MAINTENANCE STOP
  • stopped: Maintenance was stopped using ALTER SYSTEM MAINTENANCE STOP

Note: If Lime is restarted, the status record from current or past maintenance run is lost.

Example:

json
yellowbrick_test=# select sys.catalog_maintenance_status();
                                                     catalog_maintenance_status
-------------------------------------------------------------------------------------------------------------------------------------
 {                                                                                                                                  +
   "state" : "complete",                                                                                                            +
   "startedAt" : "2025-11-04T10:25:34.811554461Z",                                                                                  +
   "endedAt" : "2025-11-04T10:25:35.062168162Z",                                                                                    +
   "elapsed" : 0,                                                                                                                   +
   "totalDbs" : 9,                                                                                                                  +
   "totalRelations" : 6,                                                                                                            +
   "processed" : 6,                                                                                                                 +
   "failed" : 0,                                                                                                                    +
   "lastAction" : "a: REINDEX pg_catalog.pg_description_o_c_o_index",                                                               +
   "lastError" : null,                                                                                                              +
   "opts" : {                                                                                                                       +
     "sharedTablesOnly" : false,                                                                                                    +
     "includeAllTables" : true,                                                                                                     +
     "topN" : 100,                                                                                                                  +
     "dryRun" : false,                                                                                                              +
     "retries" : 3,                                                                                                                 +
     "pauseServices" : "gc,flusher,analyzer,autovacuum",                                                                            +
     "maxDuration" : "3600s",                                                                                                       +
     "verbose" : true,                                                                                                              +
     "parallelism" : 4                                                                                                              +
   },                                                                                                                               +
   "databaseActions" : {                                                                                                            +
     "a" : [                                                                                                                        +
       {"schema":"pg_catalog","relname":"pg_proc","action":"VACUUM FULL","result":"ok","elapsedMs":67,"error":null},                +
       {"schema":"pg_catalog","relname":"pg_constraint_contypid_index","action":"REINDEX","result":"ok","elapsedMs":6,"error":null},+
       {"schema":"pg_catalog","relname":"pg_description_o_c_o_index","action":"REINDEX","result":"ok","elapsedMs":10,"error":null}  +
     ],                                                                                                                             +
     "yellowbrick_test" : [                                                                                                         +
       {"schema":"pg_catalog","relname":"pg_proc","action":"VACUUM FULL","result":"ok","elapsedMs":68,"error":null},                +
       {"schema":"pg_catalog","relname":"pg_constraint_contypid_index","action":"REINDEX","result":"ok","elapsedMs":6,"error":null},+
       {"schema":"pg_catalog","relname":"pg_description_o_c_o_index","action":"REINDEX","result":"ok","elapsedMs":11,"error":null}  +
     ]                                                                                                                              +
   },                                                                                                                               +
   "dbSummaries" : {                                                                                                                +
     "a" : {"processed":3,"failed":0,"reclaimedMb":535.0340281641912,"state":"complete","error":null},                              +
     "yellowbrick" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null},                                  +
     "_YBPG_stats" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null},                                  +
     "yellowbrick_test" : {"processed":3,"failed":0,"reclaimedMb":591.8280384451493,"state":"complete","error":null},               +
     "yellowbrick_test_utf8" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null},                        +
     "yellowbrick_unit_tests" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null},                       +
     "maint_test" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null},                                   +
     "glinda_test_utf8" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null},                             +
     "glinda_test_latin9" : {"processed":0,"failed":0,"reclaimedMb":0.0,"state":"complete","error":null}                            +
   }                                                                                                                                +
 }
(1 row)

From the Lime logs

When running in verbose mode, the maintenance operation will log each operation it runs in the Lime logs.

It will also log a small progress status every 15 seconds with the total number of relations to be processed, how many are done / failed and the overall status of the maintenance run.

text
Progress: 3/15 processed, 0 failed (state=running)

Configuration Parameters

Use ALTER SYSTEM to tune global maintenance thresholds and behavior:

ParameterDefaultDescription
yb_maintenance_dead_tuple_ratio_threshold0.10Dead/live tuple ratio that triggers a VACUUM recommendation.
yb_maintenance_min_table_mb8.0Minimum table size considered for maintenance recommendations.
yb_maintenance_include_indexestrueInclude indexes in recommendations.
yb_maintenance_max_duration_s3600s (1 hour)Maximum runtime for a maintenance run.
yb_maintenance_parallelism4Number of databases to process in parallel during maintenance.
yb_maintenance_retry_attempts3Number of retry attempts per operation during maintenance.
yb_maintenance_pause_services"gc,flusher,analyzer, autovacuum"Background services to pause during maintenance.

Note: To make these settings apply to Lime’s maintenance sessions, use ALTER SYSTEM (not SET) so that they persist across processes.

Best Practices

  • Run catalog maintenance during low workload periods to avoid lock conflicts and impacting regular workloads.
  • Use dry_run = true to preview scope before running a full session.
  • Keep yb_maintenance_parallelism modest (2–4) on smaller instances.
  • Periodically review sys.catalog_maintenance_status() for failures or skipped objects.

Troubleshooting

SymptomLikely CauseResolution
Session stops early with status timeoutmax_duration reachedIncrease duration limit.
Some database actions are marked as failedLocks or transient errorsAdjust retries or rerun maintenance in isolation.
Given max_duration parameter not applied during maintenanceInvalid unit was givenMake sure the max_duration argument unit is one of "ms", "s", "min", "h", or "d"
Configuration variable settings not applied during maintenanceSession configuration variable settings not visible to LimeUse ALTER SYSTEM to persist them.