Spark and ybrelay Glossary

Apache Spark: Open-source platform for grid computing; a framework for solving analytics problems at large scale.
Avro: Row-based storage format with its data definition in JSON, and the data itself in binary format, making it compact and efficient.
FIFOs: Named pipes, as produced by the Linux mkfifo command.
HDFS: Open-source Apache Hadoop distributed file system; manages very large data sets running on commodity hardware.
Parquet: Apache Parquet, an open-source column-oriented data storage format commonly used in Hadoop projects.
Spark application: An application that generically consumes any data Spark feeds it (in row form).
Spark job: A job that is submitted to Spark to handle large-scale data export or import.
ybrelay: Yellowbrick "relay" client that accepts incoming data in various formats from any external file system and calls ybload to bulk load it into tables.
ybload: Yellowbrick bulk load client tool.

Spark and ybrelay Glossary ​