External Format Unload Options
This section contains a list of the bulk unload options that are supported in the WITH
clause for the CREATE EXTERNAL FORMAT
command:
File Processing
- compress NONE | GZIP_FAST | GZIP | GZIP_MORE | GZIP_BEST | GZIP_STREAM_FAST | GZIP_STREAM | GZIP_STREAM_MORE | GZIP_STREAM_BEST
- The default is
NONE
. TheGZIP_
options write data as GZIP compressed files in two different compression modes: "block mode" and "stream mode." TheGZIP*
options run in block mode, and theGZIP_STREAM*
run in stream mode. See ybunload Output Files.
GZIP
and GZIP_FAST
are synonyms. GZIP_MORE
provides better compression, but slower unload performance, and GZIP_BEST
provides the best compression but much slower performance.
GZIP_STREAM
and GZIP_STREAM_FAST
are synonyms. GZIP_STREAM_MORE
provides better compression, but slower unload performance, and GZIP_STREAM_BEST
provides the best compression but much slower performance. The GZIP_STREAM_*
options are intended to be used only if your downstream workflow tools cannot handle gzip
files containing multiple compression blocks. Additionally, the GZIP_STREAM_*
options consume significantly more network connections than the GZIP*
options, meaning many network routers won't be able to handle the increased number of connections reliably.
- file_extension STRING
Specify an extension to append to each unload file name. The defaults are:
- CSV type:
csv
- TEXT type:
txt
- compressed file:
gz
- CSV type:
- file_prefix STRING
Specify a prefix to prepend to each unload file name. The default is
unload
. For example, if you use the prefix04-30-18
, the files are numbered consecutively in the following format:04-30-18_1_1.csv 04-30-18_1_2.csv 04-30-18_1_3.csv 04-30-18_1_4.csv 04-30-18_1_5.csv ...
The convention for naming files is as follows:
<file_prefix>_<worker_id>_<partnumber>.<file_extension>
file_prefix
: As defined, orunload
by default.worker_id
: A logical number assigned to each compute node.partnumber
: An incrementing number starting from1
for each compute node output file.file_extension
: File type, such as.csv
or.gz
.
- max_file_size
- Specify the maximum size of each output file. For example:
100MB
or100Mib
. Do not use fractional values. The default maximum is100Mib
. The minimum file size is5Mib
. - parallel
- Enable parallel processing on all compute nodes for the final sort operation that occurs when an unload query contains an
ORDER BY
clause. By default, the finalORDER BY
sort runs on a single compute node. If the unload query does not have an explicitORDER BY
clause, this option has no effect.
When you use the parallel
option, files are unloaded in streams from each compute node, and the rows are guaranteed to be sorted within each file and within each stream. However, the complete set of files from all the compute nodes will not be unloaded in order.
If you intend to reload data that was unloaded with the --parallel
option, it is recommended that you create the target table with a SORT ON
column. The presence of a SORT ON
column causes ybload
source files to be loaded in fully sorted order. If you need to stitch the unload files together for loading or for use in other applications, you may need to write a script that checks the first and last lines of each unloaded file.
CSV Format
- delimiter CHARACTER
- Define the field delimiter that will separate columns of data in output files. The delimiter cannot be the null byte (
0x00
).
The default is a tab character \t
in text
format, and a \n
in csv
format.
- escape CHARACTER
- Specify the character to use for escaping quotes.
The default is the same as quote
in csv
format, and not allowed in other format.
- header
- Include a header row with column names for each output file. Only applies to
csv
andtext
format.
The default is false
.
- linesep CHARACTER
- Specify a row separator.
The default is \n
for csv
and text
formats.
- nullmarker STRING
String to use as a null marker in the unloaded files. For example:
nullmarker '[NULL]'
For text
type, the default null marker is \N
.
For other formats, no default null marker is used. In the output, NULL
values will appear as adjacent delimiters with no text between them.
- quote CHARACTER
- Specify the character to use for quoting.
The default is "
in CSV, and not allowed in other format.