External Format Unload Options

This section contains a list of the bulk unload options that are supported in the WITH clause for the CREATE EXTERNAL FORMAT command:

File Processing

compress NONE | GZIP_FAST | GZIP | GZIP_MORE | GZIP_BEST | GZIP_STREAM_FAST | GZIP_STREAM | GZIP_STREAM_MORE | GZIP_STREAM_BEST: The default is NONE. The GZIP_ options write data as GZIP compressed files in two different compression modes: "block mode" and "stream mode." The GZIP* options run in block mode, and the GZIP_STREAM* run in stream mode. See ybunload Output Files.

GZIP and GZIP_FAST are synonyms. GZIP_MORE provides better compression, but slower unload performance, and GZIP_BEST provides the best compression but much slower performance.

GZIP_STREAM and GZIP_STREAM_FAST are synonyms. GZIP_STREAM_MORE provides better compression, but slower unload performance, and GZIP_STREAM_BEST provides the best compression but much slower performance. The GZIP_STREAM_* options are intended to be used only if your downstream workflow tools cannot handle gzip files containing multiple compression blocks. Additionally, the GZIP_STREAM_* options consume significantly more network connections than the GZIP* options, meaning many network routers won't be able to handle the increased number of connections reliably.

file_extension STRING

Specify an extension to append to each unload file name. The defaults are:

CSV type: csv
TEXT type: txt
compressed file: gz

file_prefix STRING

Specify a prefix to prepend to each unload file name. The default is unload. For example, if you use the prefix 04-30-18, the files are numbered consecutively in the following format:

04-30-18_1_1.csv
04-30-18_1_2.csv
04-30-18_1_3.csv
04-30-18_1_4.csv
04-30-18_1_5.csv
...

The convention for naming files is as follows:

<file_prefix>_<worker_id>_<partnumber>.<file_extension>

file_prefix: As defined, or unload by default.
worker_id: A logical number assigned to each compute node.
partnumber: An incrementing number starting from 1 for each compute node output file.
file_extension: File type, such as .csv or .gz.

max_file_size: Specify the maximum size of each output file. For example: 100MB or 100Mib. Do not use fractional values. The default maximum is 100Mib. The minimum file size is 5Mib.
parallel: Enable parallel processing on all compute nodes for the final sort operation that occurs when an unload query contains an ORDER BY clause. By default, the final ORDER BY sort runs on a single compute node. If the unload query does not have an explicit ORDER BY clause, this option has no effect.

When you use the parallel option, files are unloaded in streams from each compute node, and the rows are guaranteed to be sorted within each file and within each stream. However, the complete set of files from all the compute nodes will not be unloaded in order.

If you intend to reload data that was unloaded with the --parallel option, it is recommended that you create the target table with a SORT ON column. The presence of a SORT ON column causes ybload source files to be loaded in fully sorted order. If you need to stitch the unload files together for loading or for use in other applications, you may need to write a script that checks the first and last lines of each unloaded file.

CSV Format

delimiter CHARACTER: Define the field delimiter that will separate columns of data in output files. The delimiter cannot be the null byte (0x00).

The default is a tab character \t in text format, and a \n in csv format.

escape CHARACTER: Specify the character to use for escaping quotes.

The default is the same as quote in csv format, and not allowed in other format.

header: Include a header row with column names for each output file. Only applies to csv and text format.

The default is false.

linesep CHARACTER: Specify a row separator.

The default is \n for csv and text formats.

nullmarker STRING

String to use as a null marker in the unloaded files. For example:

nullmarker '[NULL]'

For text type, the default null marker is \N.

For other formats, no default null marker is used. In the output, NULL values will appear as adjacent delimiters with no text between them.

quote CHARACTER: Specify the character to use for quoting.

The default is " in CSV, and not allowed in other format.

Workload Management

Distributing Data

Bulk Loading Tables

Bulk Load Examples

Running a Bulk Load

Loading Tables from Parquet Files

ybload Command

Load Data with SQL

Loading Data from Object Storage

Loading from Amazon S3

Loading from Azure Blob Storage

Loading Tables with Spark

Setting up and Running a Spark Job

Setting Up the ybrelay Service

Trickle Loading Data via JDBC

Unloading Data to Object Storage

Unloading Data to Parquet Files

ybunload Command

Installing ybtools

Setting Up a Database Connection

Configuring SSL/TLS for Tools and Drivers

Secure Connections for ODBC/JDBC Clients and ybsql

Appliance

Appliance: Disk Encryption

Setting Up Encrypted Drives

Remote Diagnostics

System Alerts

Creating an Alert Endpoint

Using the System Management Console

ybcli Reference

ybcli: config

Cloud

Configuring

Vanity DNS

Yellowbrick Manager

Installing

CLI Install Instructions

Public Install Instructions

Private Install Instructions

Self-Managed Install Instructions

Permissions

Kubernetes Guides

Observability

Observability Alerts

Observability Metrics

Databases

Backup & Restore

Overview

ybbackup Commands

ybbackupctl Commands

ybrestore Commands

Database Replication

Managing Replication

Setting Up Replication

Encrypting Sensitive Data

LDAP Integration

LDAP Authentication

Synchronizing Users and Groups

Metering

System Views

sys.lock

Workload Management

Creating WLM Resource Pools

Creating WLM Rules

Compatibility Parameters

Data Processing and Formatting

Feature Enablement

General

Tuning

Yellowbrick Row Store (YRS) Alerting Parameters

ybsql \copy Command

ybsql Properties and Variables

SQL Commands

CREATE EXTERNAL FORMAT

CREATE EXTERNAL TABLE

CREATE TABLE

GRANT

Plan Hinting

SELECT

FROM Clause

External Format Unload Options

File Processing

CSV Format