Appearance
Loading Tables with Spark
This section describes how to bulk load data from source files that ybload cannot read directly, such as Avro and ORC files. Regardless of its source location (HDFS, NFS, S3, and so on), you can load data in different formats by running Apache Spark jobs that use the Yellowbrick ybrelay service to call ybload. Exported data flows from the Spark application platform to a ybrelay server running the ybrelay service, then is loaded into Yellowbrick database tables using ybload operations.
Follow these steps to bulk load Yellowbrick tables via Spark and ybrelay. Subsequent sections explain these steps in detail and provide examples.
- Install and set up Apache Spark and the
ybrelayservice. - Define the parameters for a
spark-submitcommand:
- Native Spark options
- Spark application options:
- Yellowbrick database connectivity
ybrelayconnectivity- General options
ybloadoptions, if needed
- Run the
spark-submitcommand. - Monitor the resulting
ybloadoperation.
In This Section
Parent topic:Loading Tables