External Tables

This section covers user-defined external tables that are stored in, and accessed from, an NFS-mounted file system outside the database. For information about loading tables from object storage via SQL commands, see the documentation for the Yellowbrick Test Drive.

NFS-mounted external tables support use cases for both exporting data and importing it. You can create external tables solely for the purpose of writing data out to external files. You can also read from external tables as part of standard SQL queries, and you can use them to load other tables via INSERT and CTAS statements.

In the background, external tables use ybload and ybunload operations to read and write data.
Important: Yellowbrick Data does not recommend the use of external table loads as a general replacement for standard ybload operations.

External table operations run ybload and ybunload processes on the manager node and must share limited compute resources with other critical processes. In addition, external table loads and unloads access source and target files via NFS file systems, which are typically much slower than local file systems.

For these reasons, ybload and ybunload performance for external table operations may be significantly slower than performance for standard loads and unloads. For larger files, the performance difference may be severe.

NFS Mount Points for External Tables

Regardless of their intended use, external tables must be defined in the context of a mount point on a system that is running an NFS server. Before you can create external tables, a Yellowbrick database superuser must create at least one mount point by using the CREATE EXTERNAL MOUNT command. This command defines a short reference name for the mount point and a complete path to the NFS-mounted directory where flat files will be read or written. For example:
create external mount '/qumulo/yb100' as 'nfs://qumulo:/data/yb100/' with (gid 0, uid 0);

When an external mount is in place, you can read from files that already exist in that location or write to files that will be created in that location. Mounts created for external tables are logged to the sys.mount view.

Definition and Use of External Tables

You can create an external table with a CREATE EXTERNAL TABLE SQL command. Instead of a regular table name, you provide the external file location. The table is not registered in the system catalog, and its schema is derived from a query. (This type of table is also known as a transient external table.) A special USING clause defines format choices for writing the data out to the file.

To query an external table, you use special SELECT...FROM EXTERNAL syntax and specify the schema of the table in terms of a column list. You can query external tables continuously as long as their storage files remain in the specified location and the associated NFS mount is active. A single query can process a combination of regular table data and external table data. For example, you can join an external table to a regular table, and you can return the union of an external table and a regular table.

You can also write CTAS statements that select from external tables, and you can insert data into other tables by selecting from external tables (using INSERT INTO SELECT or SELECT INTO).

When external tables are created or queried, background ybload and ybunload operations generate detailed log files and write them to an NFS directory relative to the mount point you created. See External Table Examples.

For details about external table command syntax and options, see the SQL Commands section (or use the ybsql help command).

Examples of Creating and Selecting From External Tables

  • To write data to an external file and create an external table, use a CREATE EXTERNAL TABLE statement. This statement runs a query against database tables and writes the results to an external file. For example:
    create external table '/local_external_tables/ext_season.csv' 
    using(format text) as select * from season;
  • To read data from an external table, use a SELECT...FROM EXTERNAL statement. This statement runs a query against an external file and returns the data to the client. For example:
    select c5, c6 from external '/local_external_tables/ext_match.csv' 
    (c1 int, c2 date, c3 int, c4 int, c5 char(3), c6 char(3)) using(format text) 
    order by 1 desc limit 10;
  • To create a regular table based on data in an external table, use a CTAS statement. For example:
    create table match_results as 
    select c5, c6 
    from external '/local_external_tables/ext_match.csv' 
    (c1 int, c2 date, c3 int, c4 int, c5 char(3), c6 char(3)) using(format text) 
    order by 1 desc;
  • To read data from an external table and write it into a regular table, use an INSERT INTO SELECT...FROM EXTERNAL statement. You do not have to declare the column list for the file in this case; it is optional. The USING clause is required. For example:
    premdb=# insert into season 
    select * from external 'ext_db/season.csv' 
    using(format csv);
    INSERT 0 25

    The external table in this statement uses the schema of the target table (season). The columns in the external data file must be in the same order as they are in the target table, and every column in the target table must also exist in the external file.

Limitations on External Tables

  • You cannot create views by selecting from external tables.
  • External tables are not backed up or restored.
  • External tables cannot be updated, deleted, or truncated.