Appearance
ybunload Examples
The following ybunload
examples do not show the ABOUT CLIENT
messages that are routinely shown at the top of the output. These messages are logged mainly for troubleshooting purposes. For example:
13:50:28.174 [ INFO] ABOUT CLIENT:
app.cli_args = -d premdb -t newmatchstats --username yb100 -W --format text -o /home/yb100/premdb_unloads/nms --prefix newms --max-file-size 1GB
app.name_and_version = ybunload version 2.0.0-9942
java.home = /usr/lib/jvm/java-8-oracle/jre
java.version = 1.8.0_101
jvm.memory = 512.00 MB (max=6.00 GB)
jvm.name_and_version = Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
jvm.options = -Xms512m, -Xmx6g, -XX:+UseG1GC, -Dapp.name=ybunload, -Dapp.pid=6295, -Dapp.repo=/usr/lib/ybtools/lib, -Dapp.home=/usr/lib/ybtools, -Dbasedir=/usr/lib/ybtools
jvm.vendor = Oracle Corporation
os.name_and_version = Linux 4.4.0-31-generic (amd64)
Unload a table to a folder in an S3 bucket
Unload the match
table to a folder inside an AWS S3 bucket. The bucket name is yb-tmp
. See also Unloading Data to an S3 Bucket.
$ **ybunload -d premdb -t match --username bobr -W -o s3://yb-tmp/premdb/premdb\_unloads**
Database login password:
...
12:56:37.599 [ INFO] Verifying unload statement...
12:56:37.673 [ INFO] Unload statement verified
12:56:37.674 [ INFO] Beginning unload to s3://yb-tmp/premdb/premdb_unloads
12:56:38.352 [ INFO] Network I/O Complete. Waiting on file I/O
12:56:38.955 [ INFO] Key Name: unload_1_1_.csv Upload ID = SpQLEmQYU.4rbHB3NL7ZLXtdGN4Xs3ZGeWlc_F214HtI7FzBnWLsPZPH2UhVgOhJXKLDknaAn.1aXGMGjRiQdFn6Wozf40BA70IjmCDaH8Y8xRLsJ7jkkWHkqS9GdKwU
12:56:39.357 [ INFO] state: FINALIZING - (Open Sockets: 0 - Open Files 0 Other Output Streams 1)
12:56:39.488 [ INFO] Finalizing...
12:56:39.489 [ INFO] Transfer complete
12:56:39.490 [ INFO] Transferred: 304.00 KB Avg Network BW: 24.75 MB/s Avg Disk write rate: 264.58 KB/s
Unload the results of a query to a text file
This example shows the query itself on the command line. An alternative approach is to use the --select-file
option to call a file that contains the SQL statement. The --select-file
option is recommended for longer, more complex queries.
$ **ybunload -d premdb --username yb100 -W --format text -o /home/brumsby/premdb\_unloads --truncate-existing --select
"select \*, \(total\_goals/380.00\)::dec\(3,2\) as goals\_per\_match
from \(
select season\_name, numteams,
sum\(substr\(ftscore,1,1\)::int\)+sum\(substr\(ftscore,3,1\)::int\) total\_goals
from season, match
where season.seasonid=match.seasonid and season.seasonid>=4
group by season\_name,numteams
\) t1
order by 1;"
**
Database login password:
...
18:31:06.549 [ INFO] Verifying unload statement...
18:31:06.711 [ INFO] Unload statement verified
18:31:06.714 [ INFO] Beginning unload to /home/yb100/premdb_unloads
18:31:10.105 [ INFO] Network I/O Complete. Waiting on file I/O
18:31:10.204 [ INFO] Finalizing...
18:31:10.206 [ INFO] Transfer complete
18:31:10.207 [ INFO] Transferred: 429.00 B Avg Network BW: 40.64 KB/s Avg Disk write rate: 3.71 KB/s
Unload query results in parquet format
In this example, the results of the same query from the previous example are unloaded in parquet
format to a local directory. The query text is passed in with the --select-file
option and a prefix is used for the unload file.
$ ybunload -d premdb --username bobr -W --format parquet -o /home/brumsby/premdb_unloads --prefix goals_per_match --truncate-existing --select-file /home/brumsby/goals_per_match.sql
Password for user bobr:
...
15:49:06.357 [ INFO] Removing existing files that match prefix=goals_per_match and extension=.parquet
15:49:06.368 [ INFO] removing goals_per_match_1_0_.parquet
15:49:06.379 [ INFO] Verifying unload statement...
15:49:06.569 [ INFO] Unload statement verified
15:49:06.570 [ INFO] Beginning unload to /home/brumsby/premdb_unloads
15:49:06.587 [ INFO] Session Key = DgNzJrAIz_RIMXw4zhiKDkZf7TVkYNlAmOQFPYwuWUF8dbgbRaXcuBBlh0EUtwI=
15:49:07.670 [ INFO] Network I/O Complete. Waiting on file I/O
15:49:08.405 [ INFO] state: FINALIZING - (Open Sockets: 0 - Open Files 0 Other Output Streams 1)
15:49:08.511 [ INFO] Finalizing...
15:49:08.511 [ INFO] Transfer complete
15:49:08.512 [ INFO] Transferred: 4.00 B Avg Network BW: 2.39 KB/s Avg Disk write rate: 0.00 KB/s
The schema of the resulting file looks like this:
% parquet-tools schema goals_per_match_1_0_.parquet
message schema {
optional binary season_name (STRING);
optional int32 numteams (INTEGER(16,true));
optional int64 total_goals (INTEGER(64,true));
optional fixed_len_byte_array(4) goals_per_match (DECIMAL(3,2));
}
The data in the file looks like this:
% parquet-tools cat goals_per_match_1_0_.parquet
season_name = 1995-1996
numteams = 20
total_goals = 988
goals_per_match = 2.60
season_name = 1996-1997
numteams = 20
total_goals = 970
goals_per_match = 2.55
...
See also Unloading Data to Azure Storage for another example of an unload in parquet
format.
Unload a table to multiple 1GB files
Unload a table and set a maximum file size of 1GB per output file:
$ **ybunload -d newdb -t newmatchstats -W -o /home/yb100/premdb\_unloads --truncate-existing --max-file-size 1GB**
Database login password:
...
17:10:41.612 [ INFO] Verifying unload statement...
17:10:41.690 [ INFO] Unload statement verified
17:10:41.693 [ INFO] Truncating existing files that start with: unload
17:10:41.696 [ INFO] Beginning unload to /home/yb100/premdb_unloads
17:10:44.166 [ INFO] state: RUNNING - Network BW: 35.11 MB/s Disk BW: 66.22 MB/s
17:10:45.151 [ INFO] state: RUNNING - Network BW: 34.57 MB/s Disk BW: 67.50 MB/s
17:10:46.149 [ INFO] state: RUNNING - Network BW: 35.16 MB/s Disk BW: 66.36 MB/s
...
17:11:21.161 [ INFO] state: RUNNING - Network BW: 34.56 MB/s Disk BW: 65.70 MB/s
17:11:21.478 [ INFO] Network I/O Complete. Waiting on file I/O
17:11:21.479 [ INFO] Finalizing...
17:11:21.949 [ INFO] Transfer complete
17:11:21.949 [ INFO] Transferred: 2.46 GB Avg Network BW: 34.16 MB/s Avg Disk write rate: 64.86 MB/s
The resulting files look like this:
-rw-r--r-- 1 yb100 users 999805680 May 4 17:10 unload_1_1_.csv
-rw-r--r-- 1 yb100 users 999956727 May 4 17:11 unload_1_2_.csv
-rw-r--r-- 1 yb100 users 11296574 May 4 17:11 unload_1_3_.csv
Unload a table in parallel, in GZIP format, using a prefix for the output file names
**$ ybunload -d premdb -t newmatchstats --compress gzip --parallel --prefix May16 -o /home/yb100/premdb\_unloads --username bobr -W**
Database login password:
...
12:40:40.542 [ INFO] Verifying unload statement...
12:40:40.612 [ INFO] Unload statement verified
12:40:40.615 [ INFO] Beginning unload to /home/yb100/premdb_unloads
12:40:41.409 [ INFO] Network I/O Complete. Waiting on file I/O
12:40:41.415 [ INFO] Finalizing...
12:40:41.415 [ INFO] Transfer complete
12:40:41.415 [ INFO] Transferred: 4.33 MB Avg Network BW: 41.27 MB/s Avg Disk write rate: 39.39 MB/s
The resulting files look like this:
-rw-r--r-- 1 yb100 users 1117723 May 16 12:40 May16_1_1_.gz
-rw-r--r-- 1 yb100 users 1179454 May 16 12:40 May16_2_1_.gz
-rw-r--r-- 1 yb100 users 1226480 May 16 12:40 May16_3_1_.gz
-rw-r--r-- 1 yb100 users 1019678 May 16 12:40 May16_4_1_.gz