Skip to content

External Format Unload Options

This section contains a list of the bulk unload options that are supported in the WITH clause for the CREATE EXTERNAL FORMAT command:

File Processing

compress NONE | GZIP_FAST | GZIP | GZIP_MORE | GZIP_BEST | GZIP_STREAM_FAST | GZIP_STREAM | GZIP_STREAM_MORE | GZIP_STREAM_BEST
The default is NONE. The GZIP_ options write data as GZIP compressed files in two different compression modes: "block mode" and "stream mode." The GZIP* options run in block mode, and the GZIP_STREAM* run in stream mode. See ybunload Output Files.

GZIP and GZIP_FAST are synonyms. GZIP_MORE provides better compression, but slower unload performance, and GZIP_BEST provides the best compression but much slower performance.

GZIP_STREAM and GZIP_STREAM_FAST are synonyms. GZIP_STREAM_MORE provides better compression, but slower unload performance, and GZIP_STREAM_BEST provides the best compression but much slower performance. The GZIP_STREAM_* options are intended to be used only if your downstream workflow tools cannot handle gzip files containing multiple compression blocks. Additionally, the GZIP_STREAM_* options consume significantly more network connections than the GZIP* options, meaning many network routers won't be able to handle the increased number of connections reliably.

file_extension STRING

Specify an extension to append to each unload file name. The defaults are:

  • CSV type: csv
  • TEXT type: txt
  • compressed file: gz
file_prefix STRING

Specify a prefix to prepend to each unload file name. The default is unload. For example, if you use the prefix 04-30-18, the files are numbered consecutively in the following format:

04-30-18_1_1.csv
04-30-18_1_2.csv
04-30-18_1_3.csv
04-30-18_1_4.csv
04-30-18_1_5.csv
...

The convention for naming files is as follows:

<file_prefix>_<worker_id>_<partnumber>.<file_extension>
  • file_prefix: As defined, or unload by default.
  • worker_id: A logical number assigned to each compute node.
  • partnumber: An incrementing number starting from 1 for each compute node output file.
  • file_extension: File type, such as .csv or .gz.
max_file_size
Specify the maximum size of each output file. For example: 100MB or 100Mib. Do not use fractional values. The default maximum is 100Mib. The minimum file size is 5Mib.
parallel
Enable parallel processing on all compute nodes for the final sort operation that occurs when an unload query contains an ORDER BY clause. By default, the final ORDER BY sort runs on a single compute node. If the unload query does not have an explicit ORDER BY clause, this option has no effect.

When you use the parallel option, files are unloaded in streams from each compute node, and the rows are guaranteed to be sorted within each file and within each stream. However, the complete set of files from all the compute nodes will not be unloaded in order.

If you intend to reload data that was unloaded with the --parallel option, it is recommended that you create the target table with a SORT ON column. The presence of a SORT ON column causes ybload source files to be loaded in fully sorted order. If you need to stitch the unload files together for loading or for use in other applications, you may need to write a script that checks the first and last lines of each unloaded file.

CSV Format

delimiter CHARACTER
Define the field delimiter that will separate columns of data in output files. The delimiter cannot be the null byte (0x00).

The default is a tab character \t in text format, and a \n in csv format.

escape CHARACTER
Specify the character to use for escaping quotes.

The default is the same as quote in csv format, and not allowed in other format.

header
Include a header row with column names for each output file. Only applies to csv and text format.

The default is false.

linesep CHARACTER
Specify a row separator.

The default is \n for csv and text formats.

nullmarker STRING

String to use as a null marker in the unloaded files. For example:

nullmarker '[NULL]'

For text type, the default null marker is \N.

For other formats, no default null marker is used. In the output, NULL values will appear as adjacent delimiters with no text between them.

quote CHARACTER
Specify the character to use for quoting.

The default is " in CSV, and not allowed in other format.