Appearance
Loading Tables with Spark
This section describes how to bulk load data from source files that ybload
cannot read directly, such as Avro and Parquet data. Regardless of its source location (HDFS, NFS, S3, and so on), you can load data in different formats by running Apache Spark jobs that use the Yellowbrick ybrelay
client to call ybload
.
Follow these steps to bulk load Yellowbrick tables via Spark and ybrelay
. Subsequent sections explain these steps in detail and provide examples.
- Install and set up Apache Spark and
ybrelay
. - Define the parameters for a
spark-submit
command:
- Native Spark options
- Spark application options:
- Yellowbrick database connectivity
ybrelay
connectivity- General options
ybload
options, if needed
- Run the
spark-submit
command. - Monitor the resulting
ybload
operation.
In This Section
Parent topic:Yellowbrick Documentation