Managing the Row Store

Each data warehouse instance you create contains its own predefined persistent volume that serves as its row store. The row store is an intermediate storage system for data that is not loaded directly to the column store. By default, tables that are loaded via INSERT INTO...VALUES statements and ybsql \copy operations write rows to the row store, while tables that are loaded via ybload operations and INSERT INTO...SELECT statements write rows directly to the column store (object storage accessed by the worker nodes).

The row store also serves as temporary holding area for deleted rows that are periodically "garbage-collected" (or vacuumed) by a background "GC" operation.

Default Flushing Behavior

A background service periodically flushes rows from the row store to the column store. This service scans database tables and queues them for flushing. The service issues the appropriate commands automatically, taking into account the expected number of workers where the flush operation will run. For efficient storage in the column store, data is flushed automatically when a table accrues more than 30MB of data in the row store (or when a single transaction writing to the row store exceeds 30MB of data). You do not need to flush tables manually.

To make sure that smaller batches of data do not remain in the row store for too long before being flushed (for example, updates to small dimension tables), a time limit of 6 hours also applies. By default, data does not remain in the row store any longer than 6 hours, regardless of the amount of data ready to be flushed for a table. Both of these limits (30MB and 6 hours) are set globally and apply to all tables.

Depending on your workload, the row store volume may become full or run the risk of becoming full. To mitigate these concerns, you can manage the row store at the table level, as described in the following sections. However, note that most workloads should run efficiently without any changes to the default behavior.

Default Action Taken When the Row Store Becomes Full

When the row store becomes full for a table (or completely full), the default behavior is to insert rows directly into the column store. This behavior ensures that data continues to load; however, performance will not be optimal. When rows start being loaded into the column store instead of the row store, you will see a warning message. For example:
premdb=# \copy match from '/home/brumsby/match.csv' with delimiter ',' null '';
WARNING:  file limit 1 reached for oid 16437 performing slow insert
CONTEXT:  COPY match, line 6503: "17,2008-12-13 00:00:00,25,67,0-1,0-0"
COPY 8606

Internally, the \copy or INSERT operation that you were running is translated to a YCOPY operation, which is an internally generated plan for sending rows directly to the worker nodes. The threshold for switching from updating the row store to updating the column store is 30MB of data; when a specific \copy or INSERT transaction exceeds that limit, the remaining rows are loaded via YCOPY. (This threshold may be overridden by a row store size limit that you set for a table; see Modifying the Row Store Size for Tables.)

Although load performance is slower when inserts are promoted to YCOPY operations, the related flushing and space management behavior is optimized. Therefore, the default behavior is recommended for most workloads.

The row store full action also applies to UPDATE statements because an UPDATE statement is executed as a DELETE followed by an INSERT. For example:
premdb_match=# update match set seasonid=29;
ERROR:  file limit 1 reached for oid 16489...

Independent DELETE statements do not respect the action setting; however, if the entire row store drive becomes full, DELETE statements may fail.

You cannot explicitly run a YCOPY operation, but you can monitor the YCOPY queries that the system generates in the context of workload management (WLM), and you can track YCOPY activity in the sys.query and sys.log_query views.

Alternative Actions: Block or Cancel

Two other actions can be taken when the row store becomes full:
  • Block the insert until a flush is possible, then resume. The current transaction will wait until sufficient space is available in the row store, then continue.
  • Cancel the insert and return an error.
To modify the default behavior and enforce the block or cancel option for specific tables, use an ALTER TABLE command. You cannot modify the behavior at the database or system level. For example, the following command sets the Cancel action ('C') for table match:
premdb=# alter table match set rowstore_full_action 'C';

To see the current action setting for tables, query the sys.table view.

Modifying the Row Store Size for Tables

In addition to setting the action that is taken when the row store becomes full, you can limit the size of the row store for specific tables. In this way, you can mitigate the effects of the following problems:
  • Slow performance of queries on the table
  • Inability of the system to accept writes because the table occupies too much space in the row store
  • Failure of INSERT, UPDATE, and DELETE queries when the row store becomes full

Using an ALTER TABLE command, you can set the size limit to 32MB or greater, which is the default size of a single file in the row store. For example, setting the size limit to 32GB translates to a maximum of 1024 32MB files.

Monitoring Row Store Operations

You can use the following methods to monitor the behavior of writes to the row store:
  • Create WLM rules: For example, you can create a rule that sends a log message or aborts YCOPY queries. You may want to manage how many YCOPY operations are occurring and not allow too many to run. For example, the following rule condition aborts queries that have the type ycopy, then logs a message.
    if (w.type === 'ycopy') {
    	w.abort('YCOPY operation aborted');
    Alternatively, a rule condition can filter on the SQL text. For example:
    if ((String(w.SQLText).indexOf('ycopy') >= 0)) {
    	w.abort('YCOPY operation aborted');
    For more details, see Creating WLM Rules.
  • Query system views: sys.table, sys.query, and sys.log_query. The sys.table view tracks row counts and table-level settings.