Skip to content

Specifying NULL Behavior for Different Columns

This section explains how you can use the --nullmarker option in different ways in the same bulk load. You may run into a situation where different columns in the same table must be loaded with NULL values, but the source file for the load defines NULL in different ways.

By default, ybload treats adjacent delimiters without text between them as NULL values and adjacent quotes within delimiters as empty strings. For example, assume that a line in a comma-delimited source file looks like this:

123,,456,"",789

This line represents 5 fields or columns:

  • Column 1: 123
  • Column 2: NULL (If this column is not nullable, ybload returns an error.)
  • Column 3: 456
  • Column 4: empty string (If this column cannot contain an empty string, ybload returns an error; only CHAR and VARCHAR columns can contain empty strings.)
  • Column 5: 789

If the fields in your source files and their corresponding table columns conform with this pattern with respect to NULL values and empty strings, ybload will load the data correctly by default. However, if your data contains special values that are intended to represent NULL, read the rest of this section to understand how to load those values correctly (or discard them as appropriate). See also the description of the --emptymarker option under ybload Options.

Consider the DDL for the following table:

create table team_with_nulls(
teamid smallint not null,
htid smallint not null,
atid smallint not null,
name varchar(30),
nickname varchar(20),
city varchar(20),
stadium varchar(50),
debut date)
distribute replicate;

Note that the four VARCHAR columns are nullable, and the final DATE column is also nullable. Now consider the following 5 lines in the source file:

46,47,96,Wimbledon,Dons,London,Selhurst Park,1991/08/01
47,48,97,Wolverhampton Wanderers,Wolves,Wolverhampton,Molineux Stadium,2003/08/01
48,49,98,,null,,,----/--/--
49,50,99,,null,,,----/--/--
50,51,100,,null,,,----/--/--
  • Columns 4, 6, and 7 have consecutive delimiters, implying NULL values.
  • Column 5 contains an explicit null character string, but this string must be stored as NULL.
  • Column 8 has ----/--/-- in some rows. This value also needs to be loaded as NULL.

Fortunately, ybload has options that work for all fields in the source file, for all fields with a given data type, or for specific named fields. For example, within the same ybload command, you could set the --nullmarker option globally, for all VARCHAR fields, and for one or more specific fields. The per-type and per-field options, if present, override the global settings.

In this case, the per-type option will not solve the issue with the VARCHAR columns because one of them has a different marker for NULL. However, the global default --nullmarker option takes care of the columns where adjacent delimiters indicate NULL, and you can specify a different NULL value for column 5. For column 8, the only DATE column in the table, you can use either a per-field option or a per-type option:

  • Column 5: load null as NULL, using:
--per-field-options '{"5":{"nullmarker":"null"}}'
  • Column 8: load ----/--/-- as NULL, using :
--date-field-options '{"nullmarker": "----/--/--"}'

or:

--per-field-options '{"8":{"nullmarker":"----/--/--"}}'

The complete ybload command looks like this:

$ ybload -d premdb -t team_with_nulls --username bobr -W /home/premdata/team_with_nulls.csv 
--format csv --date-field-options '{"nullmarker": "----/--/--"}' --per-field-options '{"5":{"nullmarker":"null"}}'

The global --nullmarker '' option is not required because consecutive delimiters represent the default null marker.

The resulting rows in the table look like this:

$ ybsql premdb
Null display is "[NULL]"
...
premdb=# select * from team_with_nulls;
 teamid | htid | atid |          name           | nickname |     city      |     stadium      |   debut    
--------+------+------+-------------------------+----------+---------------+------------------+------------
    46 |   47 |   96 | Wimbledon               | Dons     | London        | Selhurst Park    | 1991-08-01
    47 |   48 |   97 | Wolverhampton Wanderers | Wolves   | Wolverhampton | Molineux Stadium | 2003-08-01
    48 |   49 |   98 | [NULL]                  | [NULL]   | [NULL]        | [NULL]           | [NULL]
    49 |   50 |   99 | [NULL]                  | [NULL]   | [NULL]        | [NULL]           | [NULL]
    50 |   51 |  100 | [NULL]                  | [NULL]   | [NULL]        | [NULL]           | [NULL]
(5 rows)

Note that in this ybsql session NULL values are intentionally displayed as [NULL] for readability. (See ybsql Display Properties for information about the \pset command.)

Parent topic:Running a Bulk Load