Appearance
Resuming a Partial Load
If a bulk load consists of multiple transactions and fails after some transactions have been committed, you can resume the load. Use the --resume-partial-load-from-offset
option to resume at a specific byte offset, as reported by the load messages. For example:
...
2018-01-17 17:02:59.989 [FATAL] <main> FAILED BULK LOAD: Last commit occurred after 300 good rows
2018-01-17 17:02:59.989 [ WARN] <main> At the time of the last commit:
300 good row(s) had been committed
3 bad row(s) had been skipped
1 source(s) had been partially loaded
2018-01-17 17:02:59.989 [ WARN] <main>
To resume loading from the last committed position, invoke ybload as follows:
1) ybload <original options> --resume-partial-load-from-offset 150000 /data/tests/tmp1.csv
2018-01-17 17:02:59.989 [ WARN] <main> BEWARE: Additional bad rows were written to the bad row file after the last commit
2018-01-17 17:02:59.989 [ WARN] <main> When fixing rows in the bad row file, ignore any bad rows that follow this message:
2018-01-17 17:02:59.990 [ WARN] <main> "----- successful commit after 3 bad rows -----"
In the following example, the load can restart from the beginning of the fourth file. Therefore the --resume-partial-load-from-offset
option is not necessary:
...
2018-01-17 17:03:06.949 [FATAL] <main> FAILED BULK LOAD: Last commit occurred after 300 good rows
2018-01-17 17:03:06.949 [ WARN] <main> At the time of the last commit:
300 good row(s) had been committed
3 bad row(s) had been skipped
3 source(s) had been completely loaded
6 source(s) had not started to load
2018-01-17 17:03:06.950 [ WARN] <main>
To resume loading from the last committed position, invoke ybload as follows:
1) ybload <original options> \
/data/tests/tmp4.csv \
/data/tests/tmp5.csv \
/data/tests/tmp6.csv \
/data/tests/tmp7.csv \
/data/tests/tmp8.csv \
/data/tests/tmp9.csv
2018-01-17 17:03:06.950 [ WARN] <main> BEWARE: Additional bad rows were written to the bad row file after the last commit
2018-01-17 17:03:06.950 [ WARN] <main> When fixing rows in the bad row file, ignore any bad rows that follow this message:
2018-01-17 17:03:06.950 [ WARN] <main> "----- successful commit after 3 bad rows -----"
In the third example, two separate ybload
operations need to be run to complete the load:
...
2018-01-17 17:03:12.118 [FATAL] <main> FAILED BULK LOAD: Last commit occurred after 800 good rows
2018-01-17 17:03:12.118 [ WARN] <main> At the time of the last commit:
800 good row(s) had been committed
8 bad row(s) had been skipped
1 source(s) had been completely loaded
1 source(s) had been partially loaded
1 source(s) had not started to load
2018-01-17 17:03:12.119 [ WARN] <main>
To resume loading from the last committed position, invoke ybload as follows:
1) ybload <original options> --resume-partial-load-from-offset 100000 /data/tests/big2.csv
2) ybload <original options> \
/data/tests/big3.csv
2018-01-17 17:03:12.119 [ WARN] <main> BEWARE: Additional bad rows were written to the bad row file after the last commit
2018-01-17 17:03:12.119 [ WARN] <main> When fixing rows in the bad row file, ignore any bad rows that follow this message:
2018-01-17 17:03:12.119 [ WARN] <main> "----- successful commit after 8 bad rows -----"