Skip to content

Resuming a Partial Load

If a bulk load consists of multiple transactions and fails after some transactions have been committed, you can resume the load. Use the --resume-partial-load-from-offset option to resume at a specific byte offset, as reported by the load messages. For example:

...
2018-01-17 17:02:59.989 [FATAL] <main>  FAILED BULK LOAD: Last commit occurred after 300 good rows
2018-01-17 17:02:59.989 [ WARN] <main>  At the time of the last commit:
   300 good row(s) had been committed
   3 bad row(s) had been skipped
   1 source(s) had been partially loaded

2018-01-17 17:02:59.989 [ WARN] <main>  
To resume loading from the last committed position, invoke ybload as follows:
   1) ybload <original options> --resume-partial-load-from-offset 150000 /data/tests/tmp1.csv

2018-01-17 17:02:59.989 [ WARN] <main>  BEWARE: Additional bad rows were written to the bad row file after the last commit
2018-01-17 17:02:59.989 [ WARN] <main>          When fixing rows in the bad row file, ignore any bad rows that follow this message:
2018-01-17 17:02:59.990 [ WARN] <main>          "----- successful commit after 3 bad rows -----"

In the following example, the load can restart from the beginning of the fourth file. Therefore the --resume-partial-load-from-offset option is not necessary:

...
2018-01-17 17:03:06.949 [FATAL] <main>  FAILED BULK LOAD: Last commit occurred after 300 good rows
2018-01-17 17:03:06.949 [ WARN] <main>  At the time of the last commit:
   300 good row(s) had been committed
   3 bad row(s) had been skipped
   3 source(s) had been completely loaded
   6 source(s) had not started to load

2018-01-17 17:03:06.950 [ WARN] <main>  
To resume loading from the last committed position, invoke ybload as follows:
   1) ybload <original options>  \
   /data/tests/tmp4.csv  \
   /data/tests/tmp5.csv  \
   /data/tests/tmp6.csv  \
   /data/tests/tmp7.csv  \
   /data/tests/tmp8.csv  \
   /data/tests/tmp9.csv

2018-01-17 17:03:06.950 [ WARN] <main>  BEWARE: Additional bad rows were written to the bad row file after the last commit
2018-01-17 17:03:06.950 [ WARN] <main>          When fixing rows in the bad row file, ignore any bad rows that follow this message:
2018-01-17 17:03:06.950 [ WARN] <main>          "----- successful commit after 3 bad rows -----"

In the third example, two separate ybload operations need to be run to complete the load:

...
2018-01-17 17:03:12.118 [FATAL] <main>  FAILED BULK LOAD: Last commit occurred after 800 good rows
2018-01-17 17:03:12.118 [ WARN] <main>  At the time of the last commit:
   800 good row(s) had been committed
   8 bad row(s) had been skipped
   1 source(s) had been completely loaded
   1 source(s) had been partially loaded
   1 source(s) had not started to load

2018-01-17 17:03:12.119 [ WARN] <main>  
To resume loading from the last committed position, invoke ybload as follows:
   1) ybload <original options> --resume-partial-load-from-offset 100000 /data/tests/big2.csv
   2) ybload <original options>  \
   /data/tests/big3.csv

2018-01-17 17:03:12.119 [ WARN] <main>  BEWARE: Additional bad rows were written to the bad row file after the last commit
2018-01-17 17:03:12.119 [ WARN] <main>          When fixing rows in the bad row file, ignore any bad rows that follow this message:
2018-01-17 17:03:12.119 [ WARN] <main>          "----- successful commit after 8 bad rows -----"