External Table Examples
The following examples show to create and select from external tables. These examples
assume that an external mount exists with the reference name
/ext_db/
Create an external table:
premdb=# create external table '/ext_db/ext_season.csv'
using(format text) as select * from season;
SELECT 25
Query the external table:
Subsequent queries can read from the external table file. Select data from this kind
of table by using FROM EXTERNAL
syntax:
premdb=# select * from external '/ext_db/season.csv'
(seasonid int, season_name char(9), numteams int, winners varchar(40))
using (format csv);
seasonid | season_name | numteams | winners
----------+-------------+----------+-------------------
1 | 1992-1993 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
3 | 1994-1995 | 22 | Blackburn Rovers
4 | 1995-1996 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
...
Create an external table with additional options:
USING
clause:- Format the output as CSV.
- Compress the output as GZIP (using the
GZIP_BEST
unload compression type). - Specify a writable log directory relative to the mount point. (This is useful for keeping track of log files that are tied to specific uses of external tables. For example, you can direct the logs for load and unload operations to different log directories.)
- Include a header line in the output file.
- Use a pipe character as the field delimiter in the file that contains the exported data.
premdb=# create external table '/ext_db/newmatchstats.gz'
using(format csv, compress 'GZIP_BEST', logdir '/ext_db/logs', includeheader true, delim '|')
as select * from newmatchstats;
SELECT 774540
You can check that the option settings were respected by unzipping and inspecting the
newmatchstats.gz
file.
SELECT...FROM
EXTERNAL
syntax. Note the use of the skiprows
option
to ignore the header line in the file, the delim
option to make
sure the columns are read correctly from the file, and the logdir
option:premdb=# select distinct seasonid, matchday
from external '/ext_db/newmatchstats'
(seasonid int, matchday date)
using(format csv, delim '|', skiprows 1, logdir '/ext_db/query_logs')
where seasonid=20
order by 2;
seasonid | matchday
----------+------------
20 | 2011-08-13
20 | 2011-08-14
20 | 2011-08-15
20 | 2011-08-20
20 | 2011-08-21
20 | 2011-08-22
20 | 2011-08-27
20 | 2011-08-28
...
premdb=# select distinct seasonid, matchday, winners
from external '/ext_db/newmatchstats'
(seasonid int, matchday date)
using(format csv, skiprows 1)
where seasonid=20
order by 2;
ERROR: column "winners" does not exist
LINE 1: select distinct seasonid, matchday, winners from external 'e...
^
Check the ybload and ybunload log files:
You can check the log files for ybunload
and ybload
operations that create and select from external tables.
These logs are useful for troubleshooting problems and as a record of when and how
external table operations were performed. If you do not set the
logdir
option in the USING
clause, the logs
are written to the same directory as the external files. Log files are named
<queryid>-UNLOAD.log
or
<queryid>-LOAD.log
.
ybunload
log was written when an external
table was
created:$ more 118173-UNLOAD.log
2019-06-05 13:58:49.484 [ INFO]
...
app.name_and_version = "external version 3.0.0-13800"
java.home = "/usr/lib/jvm/java-8-oracle/jre"
java.version = "1.8.0_101"
jvm.memory = "875.00 MB (max=6.00 GB)"
jvm.name_and_version = "Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)"
jvm.options = "-Xms512m, -Xmx6g, -XX:+UseG1GC, -Dlogfile=/home/brumsby/ybd/external/build/bin/external.log, -Dapp.name=external, -D
app.pid=21892, -Dapp.repo=/home/brumsby/ybd/external/build/lib, -Dapp.home=/home/brumsby/ybd/external/build, -Dbasedir=/home/brumsby/ybd/externa
l/build"
jvm.vendor = "Oracle Corporation"
os.name_and_version = "Linux 4.4.0-31-generic (amd64)"
2019-06-05 13:58:49.484 [ INFO] <unload-4> Beginning unload to /home/brumsby/ext_db
2019-06-05 13:58:50.756 [ INFO] <pool-6-thread-1> state: RUNNING - Network BW: 5.57 MB/s Disk BW: 1.99 MB/s
2019-06-05 13:58:51.147 [ INFO] <unload-4> Network I/O Complete. Waiting on file I/O
2019-06-05 13:58:51.147 [ INFO] <unload-4> Finalizing...
2019-06-05 13:58:51.151 [ INFO] <unload-4> Transfer complete
2019-06-05 13:58:51.151 [ INFO] <unload-4> Transferred: 2661174.00 B Avg Network BW: 5.89 MB/s Avg Disk write rate: 1.89 MB/s
ybload
log file, which was written when a user
selected from an external table:
$ more 118288-LOAD.log
2019-06-05 13:59:10.456 [ INFO]
...
app.name_and_version = "external version 3.0.0-13800"
java.home = "/usr/lib/jvm/java-8-oracle/jre"
java.version = "1.8.0_101"
jvm.memory = "875.00 MB (max=6.00 GB)"
jvm.name_and_version = "Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)"
jvm.options = "-Xms512m, -Xmx6g, -XX:+UseG1GC, -Dlogfile=/home/brumsby/ybd/external/build/bin/external.log, -Dapp.name=external, -D
app.pid=21892, -Dapp.repo=/home/brumsby/ybd/external/build/lib, -Dapp.home=/home/brumsby/ybd/external/build, -Dbasedir=/home/brumsby/ybd/externa
l/build"
jvm.vendor = "Oracle Corporation"
os.name_and_version = "Linux 4.4.0-31-generic (amd64)"
2019-06-05 13:59:10.458 [ INFO] <load-5> Assuming source encoding matches database server encoding: LATIN9
2019-06-05 13:59:10.459 [ INFO] <LFSourceContext> Gathering metadata on input files
2019-06-05 13:59:10.460 [ INFO] <load-5> --num-header-lines: 1
2019-06-05 13:59:10.464 [ INFO] <load-5> Session Key = h7TVdFVK4c0fehPzOk8R4fQUPd9F16Obi1fgeaivU8vwwK61mWqYbByciMXNIpdH
2019-06-05 13:59:10.570 [ INFO] <load-5> Starting 1 source PreReaders
2019-06-05 13:59:10.571 [ INFO] <load-5> Using database locale: C
2019-06-05 13:59:10.572 [ INFO] <PreReader #0> Auto-detected line separator = '\n'
2019-06-05 13:59:10.572 [ INFO] <LFScanContext> Configuration (record/field separation):
--format : CSV
--delimiter : |
--linesep : \n
--quote-char : "
--escape-char : "
--trim-white : false
--skip-blank-lines : true
--on-missing-field : ERROR
--on-extra-field : REMOVE
--on-unescaped-embedded-quote : ERROR
Internal scanner : RecordScannerQuote_1_1
2019-06-05 13:59:10.573 [ INFO] <PreReader #0> Reading from 75 for /home/brumsby/ext_db/newmatchstats
2019-06-05 13:59:10.596 [ INFO] <load-5> Configuration (pre-parsing):
2019-06-05 13:59:10.597 [ INFO] <load-5> Configuration (session):
tableName : "yellowbrick"."public"."external_table"
keepAliveSeconds: 60
maxBadRows : 0
2019-06-05 13:59:10.598 [ INFO] <load-5> Bad rows will be written to file:/home/brumsby/ext_db/query_logs/118288-LOAD.badrow
2019-06-05 13:59:10.598 [ INFO] <load-5> Starting import of 1 files
2019-06-05 13:59:10.772 [ INFO] <load-5> Starting 1 segment reader
2019-06-05 13:59:11.135 [ INFO] <load-5> Flushing last 774540 rows (of 774540 total) for transaction 1
2019-06-05 13:59:11.152 [ INFO] <sender 1.00> SUCCESSFUL BULK LOAD: Loaded 774540 good rows in 0:00:00 (READ: 27.64MB/s WRITE: 19.60MB/s)
ybload
operation that generates bad rows will write a
badrow
file to the same log
directory:$ more 1882713-LOAD.badrow
#error: lineByteRange(75-99)
#reason: Too few fields (1 < 2)
1|1992-08-01|2|52|01:00
Join an external table to a regular database table:
newmatchstats.csv
aliased as ex
) and join the
results to a regular database table (match
aliased as
db
):
premdb=# select distinct db.seasonid, db.matchday
from
external '/ext_db/newmatchstats.csv' (seasonid int, matchday date)
using(format csv, delim '|', skiprows 1, logdir '/ext_db/query_logs') ex
join
match db on db.seasonid=ex.seasonid
where db.seasonid in(10,11,12)
order by 1,2;
seasonid | matchday
----------+---------------------
10 | 2001-08-18 00:00:00
10 | 2001-08-19 00:00:00
10 | 2001-08-20 00:00:00
10 | 2001-08-21 00:00:00
10 | 2001-08-22 00:00:00
10 | 2001-08-25 00:00:00
10 | 2001-08-26 00:00:00
10 | 2001-08-27 00:00:00
10 | 2001-09-08 00:00:00
10 | 2001-09-09 00:00:00
10 | 2001-09-15 00:00:00
...
Join two external tables
nms
and
esn
). Note that you must specify the EXTERNAL
keyword, a column list, and a USING
clause each time you reference
an external table in a
query:premdb=# select distinct nms.seasonid, nms.matchday
from
external '/ext_db/newmatchstats' (seasonid int, matchday date) using(format csv, delim '|', skiprows 1, logdir '/ext_db/logs_may22')
nms join
external '/ext_db/ext_season_new' (seasonid int) using(format csv, logdir '/ext_db/logs_may22')
esn on nms.seasonid=esn.seasonid
order by nms.seasonid;
seasonid | matchday
----------+------------
1 | 1992-08-01
2 | 1993-09-01
2 | 1993-08-31
2 | 1993-08-15
2 | 1994-05-02
...
Insert rows into a table by selecting from an external table:
INSERT INTO SELECT...FROM EXTERNAL
statement. You do not have
to declare the column list for the file in this case; it is optional. The
USING
clause is required. For example:
premdb=# insert into season
select * from external '/ext_db/season.csv'
using(format csv);
INSERT 0 25
The external table in this statement uses the schema of the target table
(season
). The columns in the external data file must be in the
same order as they are in the target table, and every column in the target table
must also exist in the external file.