Skip to content

ybload Command

Each ybload command defines:

  • The target table, which must exist in the database
  • An absolute or relative path to an input file or multiple input files (local or S3)
  • Database connection options
  • Options for processing the load, returning output, and parsing the input data

Note: Do not run bulk loads as a superuser. The user must have INSERT privileges on the target table but does not have to own the table.

Basic Syntax

ybload [options] -t [SCHEMA.]TABLENAME SOURCE [SOURCE]...

The -t option supports either TABLENAME or SCHEMA.TABLENAME. If you do not specify the schema name, the table is assumed to be in the public schema. The schema of the target table is not based on the user's search_path, regardless of how it is set.

DATABASE.SCHEMA.TABLENAME is not supported. If you want to specify the database name for a table, use the -d option.

Note: If you used a quoted case-sensitive identifier to create the target table, you must quote the table name and escape the quotes in the ybload command line. For example, if your table is named PremLeagueStats:

-t \"PremLeagueStats\"

The ybload client tool assumes that the character encoding of the source file (LATIN9 or UTF-8, for example) matches the encoding of the target database. If this not the case, see the --encoding option.

Calling the ybload Utility

The ybload command connects to the database via JDBC as a Java client application. After you have downloaded and installed the bulk loader client (as part of the ybtools package), you can run the Linux and Windows executable programs:

  • ybload on Linux
  • ybload.exe on Windows

Getting Online Help

Use the following ybload options to return online help text:

  • -? or --help for a usage summary and coverage of the basic options.
  • --help-advanced for a usage summary and a list of advanced options only; to see all options, specify both --help and --help-advanced:
$ ./ybload --help --help-advanced

The following command runs the bulk loader and returns basic help text for load options.

$ ./ybload --help
ybload version 5.4.0-20220314113018

ybload loads files into an existing table in a Yellowbrick Database.

All files must have the same field layout.
...

Bulk Load Manager and Data Transfer Ports

In addition to the YBPORT setting for the database connection, the bulk loader uses the following default ports:

  • Bulk Load Manager port: 11111 for managing load transactions and progress.
  • Data transfer ports: 31000-41000. On a cluster with 15 active compute nodes, 30 ports in this range are used to load the data across the cluster (2 per active node).

Setting Other Options

The bulk loader comes with a large number of options that define ways to process the load, log results, and parse incoming data correctly. See ybload Options and Common Options in ybtools for details. Frequently used settings include null markers and expected formats for dates and times. You can define some of the more advanced parsing options for specific data types and fields very flexibly by using JSON objects.

Note: By default, ybload appends rows to tables (also referred to as inserting rows). However, you can use the --write-op option to set up a load that deletes, updates, or "upserts" rows.

In This Section

Parent topic:Bulk Loading Tables