Appearance
Setting the Commit Interval
Bulk load operations commit rows to the database based on settings for the following two options. You can override these thresholds in the ybload
command:
--rows-per-transaction
defaults to 263 – 1 rows (9223372036854775807), which is effectively unlimited.--bytes-per-transaction
defaults to 1TB (1099511627776 bytes).
The threshold that is met first is applied. Given these defaults, a 1TB source file would be committed all at once at the end of a load. However, these settings are guidelines for ybload
, not hard rules. When ybload
detects that one of the thresholds has been reached, the load will commit soon after that. Do not expect individual commits (transactions) to process exactly the number of rows or bytes that the settings specify.
Although other databases may perform faster with a smaller commit interval and a larger number of commits, ybload
performs best with a larger interval and a smaller number of commits. The only real purpose for setting a smaller commit interval is to improve "data visibility," not performance. For example, if you want incoming queries to see the latest data as it commits, and you are not concerned if the load has not completed when those queries are run, you could use a much smaller commit interval, such as 10GB or 100GB.
Parent topic:Running a Bulk Load