Managing the Row Store
Each data warehouse instance you create contains its own predefined persistent volume that serves as its row store. The row store is an intermediate storage system for data that is not loaded directly to the column store. By default, tables that are loaded via INSERT INTO...VALUES
statements and ybsql \copy
operations write rows to the row store, while tables that are loaded via ybload
operations and INSERT INTO...SELECT
statements write rows directly to the column store (object storage accessed by the worker nodes).
The row store also serves as temporary holding area for deleted rows that are periodically "garbage-collected" (or vacuumed) by a background "GC" operation.
Default Flushing Behavior
A background service periodically flushes rows from the row store to the column store. This service scans database tables and queues them for flushing. The service issues the appropriate commands automatically, taking into account the expected number of workers where the flush operation will run. For efficient storage in the column store, data is flushed automatically when a table accrues more than 30MB of data in the row store (or when a single transaction writing to the row store exceeds 30MB of data). You do not need to flush tables manually.
To make sure that smaller batches of data do not remain in the row store for too long before being flushed (for example, updates to small dimension tables), a time limit of 6 hours also applies. By default, data does not remain in the row store any longer than 6 hours, regardless of the amount of data ready to be flushed for a table. Both of these limits (30MB and 6 hours) are set globally and apply to all tables.
Depending on your workload, the row store volume may become full or run the risk of becoming full. To mitigate these concerns, you can manage the row store at the table level, as described in the following sections. However, note that most workloads should run efficiently without any changes to the default behavior.
Default Action Taken When the Row Store Becomes Full
When the row store becomes full for a table (or completely full), the default behavior is to insert rows directly into the column store. This behavior ensures that data continues to load; however, performance will not be optimal. When rows start being loaded into the column store instead of the row store, you will see a warning message. For example:
premdb=# \copy match from '/home/brumsby/match.csv' with delimiter ',' null '';
WARNING: file limit 1 reached for oid 16437 performing slow insert
CONTEXT: COPY match, line 6503: "17,2008-12-13 00:00:00,25,67,0-1,0-0"
COPY 8606
Internally, the \copy
or INSERT
operation that you were running is translated to a YCOPY
operation, which is an internally generated plan for sending rows directly to the worker nodes. The threshold for switching from updating the row store to updating the column store is 30MB of data; when a specific \copy
or INSERT
transaction exceeds that limit, the remaining rows are loaded via YCOPY
. (This threshold may be overridden by a row store size limit that you set for a table; see Modifying the Row Store Size for Tables.)
Although load performance is slower when inserts are promoted to YCOPY
operations, the related flushing and space management behavior is optimized. Therefore, the default behavior is recommended for most workloads.
The row store full action also applies to UPDATE
statements because an UPDATE
statement is executed as a DELETE
followed by an INSERT
. For example:
premdb_match=# update match set seasonid=29;
ERROR: file limit 1 reached for oid 16489...
Independent DELETE
statements do not respect the action setting; however, if the entire row store drive becomes full, DELETE
statements may fail.
You cannot explicitly run a YCOPY
operation, but you can monitor the YCOPY
queries that the system generates in the context of workload management (WLM), and you can track YCOPY
activity in the sys.query
and sys.log_query
views.
Alternative Actions: Block or Cancel
Two other actions can be taken when the row store becomes full:
- Block the insert until a flush is possible, then resume. The current transaction will wait until sufficient space is available in the row store, then continue.
- Cancel the insert and return an error.
To modify the default behavior and enforce the block or cancel option for specific tables, use an ALTER TABLE command. You cannot modify the behavior at the database or system level. For example, the following command sets the Cancel action ('C'
) for table match
:
premdb=# alter table match set rowstore_full_action 'C';
ALTER TABLE
To see the current action setting for tables, query the sys.table view.
Modifying the Row Store Size for Tables
In addition to setting the action that is taken when the row store becomes full, you can limit the size of the row store for specific tables. In this way, you can mitigate the effects of the following problems:
- Slow performance of queries on the table
- Inability of the system to accept writes because the table occupies too much space in the row store
- Failure of
INSERT
,UPDATE
, andDELETE
queries when the row store becomes full
Using an ALTER TABLE command, you can set the size limit to 32MB
or greater, which is the default size of a single file in the row store. For example, setting the size limit to 32GB
translates to a maximum of 1024 32MB files.
Monitoring Row Store Operations
You can use the following methods to monitor the behavior of writes to the row store:
- Create WLM rules: For example, you can create a rule that sends a log message or aborts
YCOPY
queries. You may want to manage how manyYCOPY
operations are occurring and not allow too many to run. For example, the following rule condition aborts queries that have the typeycopy
, then logs a message.
if (w.type === 'ycopy') {
w.abort('YCOPY operation aborted');
}
Alternatively, a rule condition can filter on the SQL text. For example:
if ((String(w.SQLText).indexOf('ycopy') >= 0)) {
w.abort('YCOPY operation aborted');
}
For more details, see Creating WLM Rules.
- Query system views: sys.table, sys.query, and sys.log_query. The
sys.table
view tracks row counts and table-level settings.
Parent topic:Database Administration