Appearance
Parquet Processing Options
This section covers ybunload
options that are specific to unloads in parquet
format. These options do not apply to unloads in csv
and text
format.
- --opt-tmp-buffer-size NUMBER
Set the buffer size (in bytes) to be used for string operations such as transcoding. The default size is
3145728
, which is large enough, in most cases, to hold the maximum width of an unloaded row, given that the size of the row is multiplied for transcoding purposes. In some cases, the default buffer size may not be big enough. Ifybunload
returns a prompt to adjust this buffer, you can increase it; otherwise, use the default value. This option is similar to the--pre-parse-buf-size
option inybload
.- --parquet-disable-dictionary-compression, --no-parquet-disable-dictionary-compression
Disable dictionary compression. Dictionary compression is enabled by default for parquet unloads.
- --parquet-rowgroup-size NUMBER
Set the size, in bytes, of a parquet row group:
- Default:
52428800
- Maximum size:
1073741824
- Minimum size:
1048576
A parquet row group is a segment of a parquet file (similar to a shard of data stored for a Yellowbrick table). A row group contains a set of rows and associated statistics, such as minimum and maximum values. You may need to limit the row group size for the following reasons:
The row group size directly affects unload performance. Memory usage on the
ybunload
client will be proportionally higher with an increase in row group size.Skipping during reloads of unloaded parquet files will be limited if the range in maximum and minimum values for a row group is too large. For example, if all the rows for an unloaded table are written to the same row group in a parquet file, a subsequent load from that file cannot benefit from skipping. Other query planning optimizations, such as predicate pushdown, may be inhibited as well.
ybunload
may buffer up to one row group per blade. Increasing the row group size has a direct impact onybunload
memory usage, and for large systems, may causeybunload
to run out of memory.
- Default:
- --parquet-write-buffer-size NUMBER
Parquet write buffer size (in bytes). Default: 8192