Skip to content

Spark and ybrelay Glossary

Apache Spark
Open-source platform for grid computing; a framework for solving analytics problems at large scale.
Avro
Row-based storage format with its data definition in JSON, and the data itself in binary format, making it compact and efficient.
FIFOs
Named pipes, as produced by the Linux mkfifo command.
HDFS
Open-source Apache Hadoop distributed file system; manages very large data sets running on commodity hardware.
keystore
A Java keystore (JKS) file that contains certificate and public/private key information required to run Spark jobs with TLS enabled.
Parquet
Apache Parquet, an open-source column-oriented data storage format commonly used in Hadoop projects.
Spark application
An application that generically consumes any data Spark feeds it (in row form).
Spark job
A job that is submitted to Spark to handle large-scale data export or import.
TLS
Transport Layer Security, a communications protocol for authenticated and encrypted connections over a network. The TLS and SSL terms tend to be used interchangeably. SSL/TLS is also used.
ybrelay
Yellowbrick "relay" client tool that accepts incoming data in various formats from any external file system and calls ybload to bulk load it into tables.
ybload
Yellowbrick bulk load client tool.

Parent topic:Loading Tables with Spark