Loading Data Exported from SQL Server

This section covers best practices for loading data into a Yellowbrick database when the source of the data is a table in a Microsoft SQL Server database. There are specific recommendations for both exporting the data from SQL Server and loading the data with the ybload client.

Best Practices for Exporting Data from SQL Server

Yellowbrick strongly recommends that you prepare a clean csv file before trying to load it, especially if your character strings contain any of the following:
  • The same character that is used as the field separator, such as commas or pipe symbols
  • The same character that is used as the row separator, such as EOL markers
  • Quotes

If you intend to use the SQL Server bcp utility to export data, use the "query" option to select the data out of the table and escape quotes and other characters that may cause problems when you load. By default, the bcp utility does not escape quotes and separators that are found in character strings. If you use a simple "copy table to file" approach and your data has the characteristics listed above, you will need to set some special options in the ybload command to parse the incoming data successfully. In general, it is more efficient to clean up the data on its way out of the source table than to work around data problems when you set up and run the ybload operation.

For example, you can use a bcp command that looks something like this:
bcp "query" queryout <filename> -c  -S <server> -t, 
where -t, exports a comma-delimited file and "query" is a SELECT statement. If column2 in table1 has character strings that contain quotes, your query should look like this:
select column1, '\"' + REPLACE(column2,'\"','\\\"') + '\"' as column2 from table1 ...

The purpose of the REPLACE function is to replace double quotes within strings with escaped double quotes ('\"'). The entire character string will also be enclosed by double quotes.

Note: If you have a pre-2016 version of SQL Server, which does not export data in the UTF-8 encoding, use the -w option in the bcp command (instead of the -c option). The -w option runs the export using Unicode characters.

By using the -w option in the bcp command and the --encoding utf16 option in the ybload command, you will be able to export and load your data. Also make sure that your Yellowbrick database uses the utf8 encoding, not the default latin9 encoding.

For details about bcp syntax and options, see the Microsoft SQL Server documentation.

bcp Example on Windows

The following example works on Windows platforms:
bcp DW.dbo.DimDate out dw.dbo_DimDate.csv -S ABCDE\PROD -T -c -t 0x1F -r 0x0A -a 32768
The following ybload command will work with this bcp command:
ybload -d dw -t dbo.dimdate --delimiter 0x1F --format bcp dw.dbo.dimdate.csv

bcp Example on Linux

A bcp for Linux utility is available for extracting data from SQL Server. Here is an example bcp Linux command:
bcp "USE DW; SELECT * FROM dbo.DimDate(NOLOCK) ;" queryout "/mnt/data/dw.dbo_DimDate.csv" –S dw.abcd.com -c -t "0x1F" -a 32768

The same ybload command that is shown in the Windows example will work in this case.

Workarounds for Loading Exported Data

If you are unable to produce clean exported source files for bulk loading purposes, the ybload program provides some options that can parse the data and work around problems:
Null Bytes
Null-byte strings (0x00) may be exported in your data, either representing complete empty fields or empty strings within fields. You can use the --emptymarker \0 option to replace0x00 with empty strings, and the --ignorenullbytes option to ignore 0x00 within strings.
Hexadecimal Values
Timestamps may be exported as 64-bit hexadecimal values. The ybload program can parse and load these values if you use the inhex option for the appropriate fields.

For details about bulk load syntax and options, see ybload Options.