Appearance
Spark and ybrelay Glossary
- Apache Spark
- Open-source platform for grid computing; a framework for solving analytics problems at large scale.
- Avro
- Row-based storage format with its data definition in JSON, and the data itself in binary format, making it compact and efficient.
- FIFOs
- Named pipes, as produced by the Linux
mkfifo
command. - HDFS
- Open-source Apache Hadoop distributed file system; manages very large data sets running on commodity hardware.
- Parquet
- Apache Parquet, an open-source column-oriented data storage format commonly used in Hadoop projects.
- Spark application
- An application that generically consumes any data Spark feeds it (in row form).
- Spark job
- A job that is submitted to Spark to handle large-scale data export or import.
- ybrelay
- Yellowbrick "relay" client that accepts incoming data in various formats from any external file system and calls
ybload
to bulk load it into tables. - ybload
- Yellowbrick bulk load client tool.
Parent topic:Loading Tables with Spark