Skip to content

ybunload Options

This section contains detailed descriptions of the options you can use in a ybunload command. Note the following points about the format of these options and their values:

  • Options are listed in alphabetical order for quick reference.
  • Option names are shown in lowercase; they are case-sensitive.
  • Specific valid option values (such as true and false) are shown in lowercase. Variables for option values, such as STRING, are shown in uppercase. Option values are not case-sensitive.
  • The requirements for quoting option strings vary by client platform. Values are shown without quotes, but quotes are sometimes required. For example, if you specify the # character in a Linux shell, it must be enclosed by single or double quotes. If you are using a Windows client, see also Escaping Quotes in Windows Clients.

See also Common Options in ybtools and Parquet Processing Options.

--cancel-hung-uploads

Attempt to cancel any multi-part uploads to S3 that were left in a hung state because of a previous failure. You must use this option independently; you cannot unload data and cancel uploads in the same ybunload command. In this case, the -o option specifies the destination for the uploads that were failing.

For example:

$ ybunload -d premdb --username bobr -W -o s3://yb-tmp/premdb/premdb_unloads --cancel-hung-uploads

Warning: Any other uploads to the specified bucket may be cancelled if the files have the same prefix.

Note: Files that are uploaded to S3 are first written to temporary storage. After an interrupted unload, these files remain in this temporary location, but the --cancel-hung-uploads operation clears them. This operation does not clean up any files in the final unload destination (the actual path to the S3 bucket and its folders).

See also Unloading Data to an S3 Bucket and --max-file-size.

--compress, -c NONE | GZIP_FAST | GZIP | GZIP_MORE | GZIP_BEST | GZIP_STREAM_FAST | GZIP_STREAM | GZIP_STREAM_MORE | GZIP_STREAM_BEST | SNAPPY | LZO | ZSTD

The default is NONE. The GZIP_ options write data as GZIP compressed files in two different compression modes: "block mode" and "stream mode." The GZIP* options run in block mode, and the GZIP_STREAM* run in stream mode. See ybunload Output Files.

GZIP and GZIP_FAST are synonyms. GZIP_MORE provides better compression, but slower unload performance, and GZIP_BEST provides the best compression but much slower performance.

GZIP_STREAM and GZIP_STREAM_FAST are synonyms. GZIP_STREAM_MORE provides better compression, but slower unload performance, and GZIP_STREAM_BEST provides the best compression but much slower performance. The GZIP_STREAM_* options are intended to be used only if your downstream workflow tools cannot handle gzip files containing multiple compression blocks. Additionally, the GZIP_STREAM_* options consume significantly more network connections than the GZIP* options, meaning many network routers won't be able to handle the increased number of connections reliably.

The SNAPPY, LZO, and ZSTD options are only for use with parquet format unloads.

--delimiter STRING | UNICODE_CHARACTER

Define the field delimiter that will separate columns of data in output files. The default is a tab character ('\t') in text format, and a comma in csv format.

Valid delimiters include special characters such as '|', Unicode characters, and other supported escape sequences.

Note: The delimiter cannot be the null byte (0x00).

--force-quote 'column,column,…' | '*'

Specify a list of columns to quote in the output files, or specify '*' for all columns. This option is allowed only when you are using csv format. NULL values are not quoted, regardless of this option.

--format CSV | TEXT | PARQUET

Select the output format to use: text (tab-delimited by default), csv (comma-delimited by default), or parquet. The default format is csv and creates files with a .csv extension. Text format creates files with a .txt extension.

Note: You cannot unload data to Azure object storage in CSV or TEXT format. Only parquet format is supported.

--linesep LF | CR | CRLF | RS

Specify a row separator. The default is LF.

RS: The ASCII Record Separator (ASCII code 30, Unicode INFORMATIONAL SEPARATOR TWO, hex 0x001e).

--max-file-size

Specify the maximum size of a file that ybunload can export. For example: 10GB or 1TB. Do not use fractional values. The default maximum is 50GB. The minimum file size is 1GB.

Note: AWS S3 uploads are made up of a series of "parts." Each part must be >5MB, and no multi-part upload may exceed 10,000 parts. The default part size is 6MB, which means that each file that is uploaded to S3 cannot exceed 60GB. This restriction does not limit the total amount of data that ybunload can upload to S3, only the size of individual files. Increasing the default part size is not recommended.

--nullmarker

String to use as a null marker in the unloaded files. For example:

--nullmarker '[NULL]'

For --format text loads, the default null marker is \N.

For other formats, no default null marker is used. In the output, NULL values will appear as adjacent delimiters with no text between them.

-o

Name a local file directory where the unload directories and files will be placed. If you do not specify a location, your current working directory will be used. The default prefix for unload files is unload, but you can modify it by setting the --prefix option.

Note: If an output file with a given prefix already exists in the location where you are unloading data, ybunload returns an error. For example:

18:56:51.166 [ERROR] Unable to create output file: ./unload_1_1_.csv File already exists. 
Remove existing file or use --truncate-existing

You can work around this error by removing the file manually, by using the --truncate-existing option, or by setting the --prefix option.

--object-storage-*

See Object Storage Options. These options apply to loads and unloads from and to Azure, AWS S3, and S3-compatible systems.

--parallel, --no-parallel

Enable parallel processing on all workers for the final sort operation that occurs when an unload query contains an ORDER BY clause. By default, the final ORDER BY sort runs on a single worker. If the unload query does not have an explicit ORDER BY clause, this option has no effect.

When you use the --parallel option, files are unloaded in streams from each worker, and the rows are guaranteed to be sorted within each file and within each stream. However, the complete set of files from all of the workers will not be unloaded in order.

If you intend to reload data that was unloaded with the --parallel option, it is recommended that you create the target table with a SORT ON column. The presence of a SORT ON column causes ybload source files to be loaded in fully sorted order. If you need to stitch the unload files together for loading or for use in other applications, you may need to write a script that checks the first and last lines of each unloaded file.

See also --select, -s and ybunload Output Files.

--prefix

Specify a prefix to attach to each unload file name. The default is unload. For example, if you use the prefix 04-30-18, the files are numbered consecutively in the following format:

04-30-18_1_1_.csv
04-30-18_1_2_.csv
04-30-18_1_3_.csv
04-30-18_1_4_.csv
04-30-18_1_5_.csv
...

The convention for naming files is as follows:

<prefix_><streamID_><partnumber_>.<extension>
  • prefix_: As defined, or unload by default.
  • stream id_: A number assigned to each data stream from the workers. The streams are not in any particular order relative to a specific worker, and a single worker may provide multiple streams.
  • partnumber_: An incrementing number starting from 1 for each stream.
  • .extension: file type, such as .csv or .gz.
--quote UNICODE_CHARACTER

Specify the character to use for quoting.

--quote-escape UNICODE_CHARACTER

Specify the character to use for escaping quotes.

--select, -s

Run a SQL SELECT statement to unload data; any valid query is allowed. You cannot unload the results of other SQL statements, such as INSERT or CREATE TABLE AS (CTAS).

If any column in the SELECT list produces an INTERVAL data type, the unload operation will fail.

Note: If you want to unload data to a single file, use the --select option and include an ORDER BY clause in the query. Sorting the unloaded data in this way requires the entire result set of the query to fit into memory (or memory plus spill space).

--select-file, -f

Specify a file that contains the query you want to run in the ybunload operation (instead of entering the statement directly on the command line with the --select option). You can use this option and the @file option in the same ybunload command.

--stdout

Unload data to stdout instead of a destination directory. When you use this option, also specify -q, --logfile, and a --logfile-log-level value other than OFF.

--table, -t

This option supports TABLENAME, SCHEMA.TABLENAME, or DATABASE.SCHEMA.TABLENAME formats. If you do not specify the schema name, the table is assumed to be in the public schema. You do not have to quote table names that have mixed case. For example, all of the following table entries are valid for a table named AwayTeam:

-t public."AwayTeam" 
-t 'public."AwayTeam"' 
-t "AwayTeam" 
-t AwayTeam
--truncate-existing

Remove all the existing output files in the destination directory that have the same prefix as the files being unloaded. You cannot use this option when unloading data to AWS S3 buckets.

Parent topic:ybunload Command