Appearance
Bulk Data Loading
Yellowbrick supports loading data through standard SQL INSERT
and PostgreSQL \copy
commands. This works well for small quantities of data, especially where the data needs to be inserted and available to query immediately. However, like other columnar databases, when tens of megabytes through terabytes of data need to be loaded efficiently, a bulk mode is supported that moves data directly to a compute cluster, bypassing the shared services.
Bulk loads insert data directly into database tables from source files in object storage, source files on NFS servers or source files on local disc. There are a number of different ways to bulk load data, outlined below.
Bulk Loading with ybload
Data can be loaded through a command-line tool called ybload
which is part of the client tools distribution. It supports a wide variety of file formats and protocols, as well as third party integrations. For more information see the the ybload
documentation.
Bulk loading via SQL
On cloud platforms, a full SQL grammar supports loading data from external object storage. See SQL-Based Loads from External Storage and the LOAD TABLE command.
Using Yellowbrick Manager Load Assistant
Yellowbrick Manager contains a simple load assistant that can be used for importing data sets with minimal SQL knowledge. See Loading a Table via the Load Assistant for a walkthrough.