Skip to content

External Table Examples

The following examples show to create and select from external tables. These examples assume that an external mount exists with the reference name /ext_db/

Create an external table:

Create an external table by writing (unloading) the results of a query to a file:

premdb=# create external table '/ext_db/ext_season.csv' 
using(format text) as select * from season;
SELECT 25

Query the external table:

Subsequent queries can read from the external table file. Select data from this kind of table by using FROM EXTERNAL syntax:

premdb=# select * from external '/ext_db/season.csv' 
(seasonid int, season_name char(9), numteams int, winners varchar(40)) 
using (format csv);
 seasonid | season_name | numteams |      winners      
----------+-------------+----------+-------------------
       1 | 1992-1993   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       3 | 1994-1995   |       22 | Blackburn Rovers
       4 | 1995-1996   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
...

Create an external table with additional options:

Create an external table with several options in the USING clause:

  • Format the output as CSV.
  • Compress the output as GZIP (using the GZIP_BEST unload compression type).
  • Specify a writable log directory relative to the mount point. (This is useful for keeping track of log files that are tied to specific uses of external tables. For example, you can direct the logs for load and unload operations to different log directories.)
  • Include a header line in the output file.
  • Use a pipe character as the field delimiter in the file that contains the exported data.
premdb=# create external table '/ext_db/newmatchstats.gz' 
using(format csv, compress 'GZIP_BEST', logdir '/ext_db/logs', includeheader true, delim '|') 
as select * from newmatchstats;
SELECT 774540

You can check that the option settings were respected by unzipping and inspecting the newmatchstats.gz file.

Now select some rows from this external table by using SELECT...FROM EXTERNAL syntax. Note the use of the skiprows option to ignore the header line in the file, the delim option to make sure the columns are read correctly from the file, and the logdir option:

premdb=# select distinct seasonid, matchday 
from external '/ext_db/newmatchstats' 
  (seasonid int, matchday date) 
  using(format csv, delim '|', skiprows 1, logdir '/ext_db/query_logs') 
where seasonid=20 
order by 2;
 seasonid |  matchday  
----------+------------
      20 | 2011-08-13
      20 | 2011-08-14
      20 | 2011-08-15
      20 | 2011-08-20
      20 | 2011-08-21
      20 | 2011-08-22
      20 | 2011-08-27
      20 | 2011-08-28
...

Note: You must specify the correct number and order of columns in the column list when selecting from external tables. For example, the following query fails because it attempts to select three columns but only specifies two in the column list for the external table reference:

premdb=# select distinct seasonid, matchday, winners 
from external '/ext_db/newmatchstats' 
(seasonid int, matchday date) 
using(format csv, skiprows 1) 
where seasonid=20 
order by 2;
ERROR:  column "winners" does not exist
LINE 1: select distinct seasonid, matchday, winners from external 'e...
                                           ^

Check the ybload and ybunload log files:

You can check the log files for ybunload and ybload operations that create and select from external tables.

These logs are useful for troubleshooting problems and as a record of when and how external table operations were performed. If you do not set the logdir option in the USING clause, the logs are written to the same directory as the external files. Log files are named <queryid>-UNLOAD.log or <queryid>-LOAD.log.

For example, the following ybunload log was written when an external table was created:

$ more 118173-UNLOAD.log
2019-06-05 13:58:49.484 [ INFO] 
...
   app.name_and_version = "external version 3.0.0-13800"
   java.home            = "/usr/lib/jvm/java-8-oracle/jre"
   java.version         = "1.8.0_101"
   jvm.memory           = "875.00 MB (max=6.00 GB)"
   jvm.name_and_version = "Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)"
   jvm.options          = "-Xms512m, -Xmx6g, -XX:+UseG1GC, -Dlogfile=/home/brumsby/ybd/external/build/bin/external.log, -Dapp.name=external, -D
app.pid=21892, -Dapp.repo=/home/brumsby/ybd/external/build/lib, -Dapp.home=/home/brumsby/ybd/external/build, -Dbasedir=/home/brumsby/ybd/externa
l/build"
   jvm.vendor           = "Oracle Corporation"
   os.name_and_version  = "Linux 4.4.0-31-generic (amd64)"

2019-06-05 13:58:49.484 [ INFO] <unload-4>  Beginning unload to /home/brumsby/ext_db
2019-06-05 13:58:50.756 [ INFO] <pool-6-thread-1>  state: RUNNING - Network BW: 5.57 MB/s Disk BW: 1.99 MB/s
2019-06-05 13:58:51.147 [ INFO] <unload-4>  Network I/O Complete. Waiting on file I/O
2019-06-05 13:58:51.147 [ INFO] <unload-4>  Finalizing...
2019-06-05 13:58:51.151 [ INFO] <unload-4>  Transfer complete
2019-06-05 13:58:51.151 [ INFO] <unload-4>  Transferred: 2661174.00 B Avg Network BW: 5.89 MB/s Avg Disk write rate: 1.89 MB/s

Here is an example of a ybload log file, which was written when a user selected from an external table:

$ more 118288-LOAD.log
2019-06-05 13:59:10.456 [ INFO] 
...
   app.name_and_version = "external version 3.0.0-13800"
   java.home            = "/usr/lib/jvm/java-8-oracle/jre"
   java.version         = "1.8.0_101"
   jvm.memory           = "875.00 MB (max=6.00 GB)"
   jvm.name_and_version = "Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)"
   jvm.options          = "-Xms512m, -Xmx6g, -XX:+UseG1GC, -Dlogfile=/home/brumsby/ybd/external/build/bin/external.log, -Dapp.name=external, -D
app.pid=21892, -Dapp.repo=/home/brumsby/ybd/external/build/lib, -Dapp.home=/home/brumsby/ybd/external/build, -Dbasedir=/home/brumsby/ybd/externa
l/build"
   jvm.vendor           = "Oracle Corporation"
   os.name_and_version  = "Linux 4.4.0-31-generic (amd64)"

2019-06-05 13:59:10.458 [ INFO] <load-5>  Assuming source encoding matches database server encoding: LATIN9
2019-06-05 13:59:10.459 [ INFO] <LFSourceContext>  Gathering metadata on input files
2019-06-05 13:59:10.460 [ INFO] <load-5>  --num-header-lines: 1
2019-06-05 13:59:10.464 [ INFO] <load-5>  Session Key = h7TVdFVK4c0fehPzOk8R4fQUPd9F16Obi1fgeaivU8vwwK61mWqYbByciMXNIpdH
2019-06-05 13:59:10.570 [ INFO] <load-5>  Starting 1 source PreReaders
2019-06-05 13:59:10.571 [ INFO] <load-5>  Using database locale: C
2019-06-05 13:59:10.572 [ INFO] <PreReader #0>  Auto-detected line separator = '\n'
2019-06-05 13:59:10.572 [ INFO] <LFScanContext>  Configuration (record/field separation):
   --format                           : CSV
   --delimiter                        : |
   --linesep                          : \n
   --quote-char                       : "
   --escape-char                      : "
   --trim-white                       : false
   --skip-blank-lines                 : true
   --on-missing-field                 : ERROR
   --on-extra-field                   : REMOVE
   --on-unescaped-embedded-quote      : ERROR
   Internal scanner                   : RecordScannerQuote_1_1
2019-06-05 13:59:10.573 [ INFO] <PreReader #0>  Reading from 75 for /home/brumsby/ext_db/newmatchstats
2019-06-05 13:59:10.596 [ INFO] <load-5>  Configuration (pre-parsing):
2019-06-05 13:59:10.597 [ INFO] <load-5>  Configuration (session):
   tableName       : "yellowbrick"."public"."external_table"
   keepAliveSeconds: 60
   maxBadRows      : 0
2019-06-05 13:59:10.598 [ INFO] <load-5>  Bad rows will be written to file:/home/brumsby/ext_db/query_logs/118288-LOAD.badrow
2019-06-05 13:59:10.598 [ INFO] <load-5>  Starting import of 1 files
2019-06-05 13:59:10.772 [ INFO] <load-5>  Starting 1 segment reader
2019-06-05 13:59:11.135 [ INFO] <load-5>  Flushing last 774540 rows (of 774540 total) for transaction 1
2019-06-05 13:59:11.152 [ INFO] <sender 1.00>  SUCCESSFUL BULK LOAD: Loaded 774540 good rows in   0:00:00 (READ: 27.64MB/s WRITE: 19.60MB/s)

If an external table operation fails, you can use the log files to troubleshoot. For example, a ybload operation that generates bad rows will write a badrow file to the same log directory:

$ more 1882713-LOAD.badrow

#error: lineByteRange(75-99)
#reason: Too few fields (1 < 2)
1|1992-08-01|2|52|01:00

Join an external table to a regular database table:

The following example shows how to select from an external table (newmatchstats.csv aliased as ex) and join the results to a regular database table (match aliased as db):

premdb=# select distinct db.seasonid, db.matchday 
from 
  external '/ext_db/newmatchstats.csv' (seasonid int, matchday date)
  using(format csv, delim '|', skiprows 1, logdir '/ext_db/query_logs') ex 
join
  match db on db.seasonid=ex.seasonid 
where db.seasonid in(10,11,12) 
order by 1,2;
 seasonid |      matchday       
----------+---------------------
      10 | 2001-08-18 00:00:00
      10 | 2001-08-19 00:00:00
      10 | 2001-08-20 00:00:00
      10 | 2001-08-21 00:00:00
      10 | 2001-08-22 00:00:00
      10 | 2001-08-25 00:00:00
      10 | 2001-08-26 00:00:00
      10 | 2001-08-27 00:00:00
      10 | 2001-09-08 00:00:00
      10 | 2001-09-09 00:00:00
      10 | 2001-09-15 00:00:00
...

Join two external tables

This query joins two external tables (aliased nms and esn). Note that you must specify the EXTERNAL keyword, a column list, and a USING clause each time you reference an external table in a query:

premdb=# select distinct nms.seasonid, nms.matchday 
from 
external '/ext_db/newmatchstats' (seasonid int, matchday date) using(format csv, delim '|', skiprows 1, logdir '/ext_db/logs_may22') 
nms join 
external '/ext_db/ext_season_new' (seasonid int) using(format csv, logdir '/ext_db/logs_may22') 
esn on nms.seasonid=esn.seasonid 
order by nms.seasonid;
 seasonid |  matchday  
----------+------------
       1 | 1992-08-01
       2 | 1993-09-01
       2 | 1993-08-31
       2 | 1993-08-15
       2 | 1994-05-02
...

Insert rows into a table by selecting from an external table:

To read data from an external table and write it into a regular table, use an INSERT INTO SELECT...FROM EXTERNAL statement. You do not have to declare the column list for the file in this case; it is optional. The USING clause is required. For example:

premdb=# insert into season 
select * from external '/ext_db/season.csv' 
using(format csv);
INSERT 0 25

The external table in this statement uses the schema of the target table (season). The columns in the external data file must be in the same order as they are in the target table, and every column in the target table must also exist in the external file.

Parent topic:CREATE EXTERNAL TABLE