Appearance
ybunload Options
This section contains detailed descriptions of the options you can use in a ybunload
command. Note the following points about the format of these options and their values:
- Options are listed in alphabetical order for quick reference.
- Option names are shown in lowercase; they are case-sensitive.
- Specific valid option values (such as
true
andfalse
) are shown in lowercase. Variables for option values, such asSTRING
, are shown in uppercase. Option values are not case-sensitive. - The requirements for quoting option strings vary by client platform. Values are shown without quotes, but quotes are sometimes required. For example, if you specify the
#
character in a Linux shell, it must be enclosed by single or double quotes. If you are using a Windows client, see also Escaping Quotes in Windows Clients.
See also Common Options in ybtools and Parquet Processing Options.
- --cancel-hung-uploads
Attempt to cancel any multi-part uploads to S3 that were left in a hung state because of a previous failure. You must use this option independently; you cannot unload data and cancel uploads in the same
ybunload
command. In this case, the-o
option specifies the destination for the uploads that were failing.For example:
$ ybunload -d premdb --username bobr -W -o s3://yb-tmp/premdb/premdb_unloads --cancel-hung-uploads
Warning: Any other uploads to the specified bucket may be cancelled if the files have the same prefix.
Note: Files that are uploaded to S3 are first written to temporary storage. After an interrupted unload, these files remain in this temporary location, but the
--cancel-hung-uploads
operation clears them. This operation does not clean up any files in the final unload destination (the actual path to the S3 bucket and its folders).See also Unloading Data to an S3 Bucket and --max-file-size.
- --compress, -c NONE | GZIP_FAST | GZIP | GZIP_MORE | GZIP_BEST | GZIP_STREAM_FAST | GZIP_STREAM | GZIP_STREAM_MORE | GZIP_STREAM_BEST | SNAPPY | LZO | ZSTD
The default is
NONE
. TheGZIP_
options write data as GZIP compressed files in two different compression modes: "block mode" and "stream mode." TheGZIP*
options run in block mode, and theGZIP_STREAM*
run in stream mode. See ybunload Output Files.GZIP
andGZIP_FAST
are synonyms.GZIP_MORE
provides better compression, but slower unload performance, andGZIP_BEST
provides the best compression but much slower performance.GZIP_STREAM
andGZIP_STREAM_FAST
are synonyms.GZIP_STREAM_MORE
provides better compression, but slower unload performance, andGZIP_STREAM_BEST
provides the best compression but much slower performance. TheGZIP_STREAM_*
options are intended to be used only if your downstream workflow tools cannot handlegzip
files containing multiple compression blocks. Additionally, theGZIP_STREAM_*
options consume significantly more network connections than theGZIP*
options, meaning many network routers won't be able to handle the increased number of connections reliably.The
SNAPPY
,LZO
, andZSTD
options are only for use withparquet
format unloads.- --delimiter STRING | UNICODE_CHARACTER
Define the field delimiter that will separate columns of data in output files. The default is a tab character (
'\t'
) intext
format, and a comma incsv
format.Valid delimiters include special characters such as
'|'
, Unicode characters, and other supported escape sequences.Note: The delimiter cannot be the null byte (
0x00
).- --force-quote 'column,column,…' | '*'
Specify a list of columns to quote in the output files, or specify
'*'
for all columns. This option is allowed only when you are usingcsv
format.NULL
values are not quoted, regardless of this option.- --format CSV | TEXT | PARQUET
Select the output format to use:
text
(tab-delimited by default),csv
(comma-delimited by default), orparquet
. The default format iscsv
and creates files with a.csv
extension. Text format creates files with a.txt
extension.Note: You cannot unload data to Azure object storage in
CSV
orTEXT
format. Onlyparquet
format is supported.- --linesep LF | CR | CRLF | RS
Specify a row separator. The default is
LF
.RS
: The ASCII Record Separator (ASCII code30
, UnicodeINFORMATIONAL SEPARATOR TWO
, hex0x001e
).- --max-file-size
Specify the maximum size of a file that
ybunload
can export. For example:10GB
or1TB
. Do not use fractional values. The default maximum is 50GB. The minimum file size is 1GB.Note: AWS S3 uploads are made up of a series of "parts." Each part must be >5MB, and no multi-part upload may exceed 10,000 parts. The default part size is 6MB, which means that each file that is uploaded to S3 cannot exceed 60GB. This restriction does not limit the total amount of data that
ybunload
can upload to S3, only the size of individual files. Increasing the default part size is not recommended.- --nullmarker
String to use as a null marker in the unloaded files. For example:
--nullmarker '[NULL]'
For
--format text
loads, the default null marker is\N
.For other formats, no default null marker is used. In the output,
NULL
values will appear as adjacent delimiters with no text between them.- -o
Name a local file directory where the unload directories and files will be placed. If you do not specify a location, your current working directory will be used. The default prefix for unload files is
unload
, but you can modify it by setting the--prefix
option.Note: If an output file with a given prefix already exists in the location where you are unloading data,
ybunload
returns an error. For example:18:56:51.166 [ERROR] Unable to create output file: ./unload_1_1_.csv File already exists. Remove existing file or use --truncate-existing
You can work around this error by removing the file manually, by using the
--truncate-existing
option, or by setting the--prefix
option.- --object-storage-*
See Object Storage Options. These options apply to loads and unloads from and to Azure, AWS S3, and S3-compatible systems.
- --parallel, --no-parallel
Enable parallel processing on all workers for the final sort operation that occurs when an unload query contains an ORDER BY clause. By default, the final ORDER BY sort runs on a single worker. If the unload query does not have an explicit ORDER BY clause, this option has no effect.
When you use the
--parallel
option, files are unloaded in streams from each worker, and the rows are guaranteed to be sorted within each file and within each stream. However, the complete set of files from all of the workers will not be unloaded in order.If you intend to reload data that was unloaded with the
--parallel
option, it is recommended that you create the target table with aSORT ON
column. The presence of aSORT ON
column causesybload
source files to be loaded in fully sorted order. If you need to stitch the unload files together for loading or for use in other applications, you may need to write a script that checks the first and last lines of each unloaded file.See also --select, -s and ybunload Output Files.
- --prefix
Specify a prefix to attach to each unload file name. The default is
unload
. For example, if you use the prefix04-30-18
, the files are numbered consecutively in the following format:04-30-18_1_1_.csv 04-30-18_1_2_.csv 04-30-18_1_3_.csv 04-30-18_1_4_.csv 04-30-18_1_5_.csv ...
The convention for naming files is as follows:
<prefix_><streamID_><partnumber_>.<extension>
prefix_
: As defined, orunload
by default.stream id_
: A number assigned to each data stream from the workers. The streams are not in any particular order relative to a specific worker, and a single worker may provide multiple streams.partnumber_
: An incrementing number starting from1
for each stream..extension
: file type, such as.csv
or.gz
.
- --quote UNICODE_CHARACTER
Specify the character to use for quoting.
- --quote-escape UNICODE_CHARACTER
Specify the character to use for escaping quotes.
- --select, -s
Run a SQL
SELECT
statement to unload data; any valid query is allowed. You cannot unload the results of other SQL statements, such asINSERT
orCREATE TABLE AS
(CTAS
).If any column in the
SELECT
list produces anINTERVAL
data type, the unload operation will fail.Note: If you want to unload data to a single file, use the
--select
option and include anORDER BY
clause in the query. Sorting the unloaded data in this way requires the entire result set of the query to fit into memory (or memory plus spill space).- --select-file, -f
Specify a file that contains the query you want to run in the
ybunload
operation (instead of entering the statement directly on the command line with the--select
option). You can use this option and the@file
option in the sameybunload
command.- --stdout
Unload data to
stdout
instead of a destination directory. When you use this option, also specify-q
,--logfile
, and a--logfile-log-level
value other thanOFF
.- --table, -t
This option supports
TABLENAME
,SCHEMA.TABLENAME
, orDATABASE.SCHEMA.TABLENAME
formats. If you do not specify the schema name, the table is assumed to be in thepublic
schema. You do not have to quote table names that have mixed case. For example, all of the following table entries are valid for a table namedAwayTeam
:-t public."AwayTeam" -t 'public."AwayTeam"' -t "AwayTeam" -t AwayTeam
- --truncate-existing
Remove all the existing output files in the destination directory that have the same prefix as the files being unloaded. You cannot use this option when unloading data to AWS S3 buckets.
Parent topic:ybunload Command