ybunload Options

This section describes the options you can use in a ybunload command.

Specify a file that includes a set of options and values to use for the unload. See Saving Unload Options to a File. You can use this option and the --select-file option in the same ybunload command.
--cacert STRING
Customize trust with secured communication; use this option in combination with the --secured option. Enter the file name of a custom PEM-encoded certificate or the file name and password for a Java KeyStore (JKS).
For PEM format, the file must be named with a .pem, .cert, .cer, .crt, or .key extension. For example:
--cacert cacert.pem
For JKS format, files are always password-protected. Use the following format:
--cacert yellowbrick.jks:changeit

where the : character separates the file name from the password of the keystore.

See also Enabling and Verifying SSL/TLS Encryption.

Attempt to cancel any multi-part uploads to S3 that were left in a hung state because of a previous failure. You must use this option independently; you cannot unload data and cancel uploads in the same ybunload command. In this case, the -o option specifies the destination for the uploads that were failing.
For example:
$ ybunload -d premdb --username bobr -W -o s3://yb-tmp/premdb/premdb_unloads --cancel-hung-uploads
Warning: Any other uploads to the specified bucket may be cancelled if the files have the same prefix.
Note: Files that are uploaded to S3 are first written to temporary storage. After an interrupted unload, these files remain in this temporary location, but the --cancel-hung-uploads operation clears them. This operation does not clean up any files in the final unload destination (the actual path to the S3 bucket and its folders).

See also Unloading Data to an S3 Bucket and --max-file-size.

The default is NONE. The other options write data as GZIP compressed files in two different compression modes: "block mode" and "stream mode." The GZIP* options run in block mode, and the GZIP_STREAM* run in stream mode. See ybunload Output Files.

GZIP and GZIP_FAST are synonyms. GZIP_MORE provides better compression, but slower unload performance, and GZIP_BEST provides the best compression but much slower performance.

GZIP_STREAM and GZIP_STREAM_FAST are synonyms. GZIP_STREAM_MORE provides better compression, but slower unload performance, and GZIP_STREAM_BEST provides the best compression but much slower performance. The GZIP_STREAM_* options are intended to be used only if your downstream workflow tools cannot handle gzip files containing multiple compression blocks. Additionally, the GZIP_STREAM_* options consume significantly more network connections than the GZIP* options, meaning many network routers won't be able to handle the increased number of connections reliably.

--dbname, -d
Name of the source database. The default is yellowbrick. See also Setting up a Database Connection.
Define the field delimiter that will separate columns of data in output files. The default is a tab character ('\t') in text format, and a comma in csv format.

Valid delimiters include special characters such as '|', Unicode characters, and other supported escape sequences.

Note: The delimiter cannot be the null byte (0x00).
--disable-trust, -k
Disable SSL/TLS trust when using secured communications. Trust is enabled by default. See also Enabling and Verifying SSL/TLS Encryption.
Important: This option is not supported for use on production systems and is only recommended for testing purposes. It may be useful to disable trust during testing, then enable it when a formal signed certificate is installed on the appliance.
--force-quote 'column,column,…' | '*'
Specify a list of columns to quote in the output files, or specify '*' for all columns. This option is allowed only when you are using csv format. NULL values are not quoted, regardless of this option.
--format csv | text
Select the output format to use: text (tab-delimited by default) or csv (comma-delimited by default). The default format is csv and creates files with a .csv extension. Text format creates files with a .txt extension.
-?, --help
Return basic usage information for the ybunload command and its options.
Return more advanced usage information for the ybunload command and its options.
-h, --host
Host name. See Setting up a Database Connection.
--initial-connection-timeout NUMBER
Specify the number of seconds to wait for initial connections to the database. The minimum value is 0, which means wait forever. The default is 120.
--linesep LF | CR | CRLF | RS
Specify a row separator. The default is LF.

RS: The ASCII Record Separator (ASCII code 30, Unicode INFORMATIONAL SEPARATOR TWO, hex 0x001e).

Return the Java version that is running on the client system. The client tools require the 64-bit version of Java 8 (also known as Java 1.8). Java 9 and 10 are not supported.
--log-level OFF | ERROR | WARN | INFO | DEBUG | TRACE
Specify the logging level for the default console output. The default level is INFO. (Use the --logfile-log-level option to specify the logging level for a named log file.)
--logfile STRING
Specify the name and location of a log file for the unload operation. If the specified file already exists, it will be truncated. If this option is not specified, no log file is written. When you specify this --logfile option, also specify a --logfile-log-level value other than OFF.
Note: When object storage is used for loading or unloading data, logs must be written to the local file system. Specifying a log file in an object storage location, such as an S3 bucket, is not supported.
--logfile-log-level OFF | ERROR | WARN | INFO | DEBUG | TRACE
Specify the logging level for a given log file (as defined with the --logfile option). If the level is not specified, it defaults to the --log-level value. You must specify a --logfile-log-level value other than OFF when you specify the --logfile option.
Specify the maximum size of a file that ybunload can export. For example: 10GB or 1TB. Do not use fractional values. The default maximum is 50GB. The minimum file size is 1GB.
Note: AWS S3 uploads are made up of a series of "parts." Each part must be >5MB, and no multi-part upload may exceed 10,000 parts. The default part size is 6MB, which means that each file that is uploaded to S3 cannot exceed 60GB. This restriction does not limit the total amount of data that ybunload can upload to S3, only the size of individual files. Increasing the default part size is not recommended.
String to use as a null marker in the unloaded files. For example:
--nullmarker '[NULL]'

For --format text loads, the default null marker is \N.

For other formats, no default null marker is used. In the output, NULL values will appear as adjacent delimiters with no text between them.

Name a local file directory where the unload directories and files will be placed. If you do not specify a location, your current working directory will be used. The default prefix for unload files is unload, but you can modify it by setting the --prefix option.
Note: If an output file with a given prefix already exists in the location where you are unloading data, ybunload returns an error. For example:
18:56:51.166 [ERROR] Unable to create output file: ./unload_1_1_.csv File already exists. 
Remove existing file or use --truncate-existing

You can work around this error by removing the file manually, by using the --truncate-existing option, or by setting the --prefix option.

--parallel, --no-parallel
Enable parallel processing on all workers for the final sort operation that occurs when an unload query contains an ORDER BY clause. By default, the final ORDER BY sort runs on a single worker. If the unload query does not have an explicit ORDER BY clause, this option has no effect.

When you use the --parallel option, files are unloaded in streams from each worker, and the rows are guaranteed to be sorted within each file and within each stream. However, the complete set of files from all of the workers will not be unloaded in order.

If you intend to reload data that was unloaded with the --parallel option, it is recommended that you create the target table with a SORT ON column. The presence of a SORT ON column causes ybload source files to be loaded in fully sorted order. If you need to stitch the unload files together for loading or for use in other applications, you may need to write a script that checks the first and last lines of each unloaded file.

See also --select, -s and ybunload Output Files.

-p, --port
Port number. See Setting up a Database Connection.
Specify a prefix to attach to each unload file name. The default is unload. For example, if you use the prefix 04-30-18, the files are numbered consecutively in the following format:
The convention for naming files is as follows:
  • prefix_: As defined, or unload by default.
  • stream id_: A number assigned to each data stream from the workers. The streams are not in any particular order relative to a specific worker, and a single worker may provide multiple streams.
  • partnumber_: An incrementing number starting from 1 for each stream.
  • .extension: file type, such as .csv or .gz.
-q, --quiet
Do not write any output to console. This option is suitable for cron invocations. If --quiet is specified, you must also specify --logfile.
Specify the character to use for quoting.
--quote-escape UNICODE_CHARACTER
Specify the character to use for escaping quotes.
Use SSL/TLS to secure all communications. The default is not secured. See also Enabling and Verifying SSL/TLS Encryption.
--select, -s
Run a SQL SELECT statement to unload data; any valid query is allowed. You cannot unload the results of other SQL statements, such as INSERT or CREATE TABLE AS (CTAS).
Note: If you want to unload data to a single file, use the --select option and include an ORDER BY clause in the query. Sorting the unloaded data in this way requires the entire result set of the query to fit into memory (or memory plus spill space).
--select-file, -f
Specify a file that contains the query you want to run in the ybunload operation (instead of entering the statement directly on the command line with the --select option). You can use this option and the @file option in the same ybunload command.
Unload data to stdout instead of a destination directory. When you use this option, also specify -q, --logfile, and a --logfile-log-level value other than OFF.
--table, -t
This option supports TABLENAME, SCHEMA.TABLENAME, or DATABASE.SCHEMA.TABLENAME formats. If you do not specify the schema name, the table is assumed to be in the public schema. You do not have to quote table names that have mixed case. For example, all of the following table entries are valid for a table named AwayTeam:
-t public."AwayTeam" 
-t 'public."AwayTeam"' 
-t "AwayTeam" 
-t AwayTeam
Remove all the existing output files in the destination directory that have the same prefix as the files being unloaded. You cannot use this option when unloading data to AWS S3 buckets.
-U, --username
Database username. See Setting up a Database Connection.
Display the version of ybunload you are running (as part of ybtools). This option is not intended to be combined with other options. For example:
$ ybunload --version
ybunload version 1.2.2-5653
-W, --password
Database user's password. See Setting up a Database Connection.