This post explores the SQL Server bulk load settings that are exposed to SSIS, and measures their effect on a simple SSIS data flow.
It considers bulk loads into heaps and clustered indexes, as well as showing the effect of minimally logged inserts into a database running with the simple recovery model.
The SSIS package used here is a simple data flow that reads all data from a 5 million row table, and writes it to an empty table on the same server. CHECKPOINT; DBCC DROPCLEANBUFFERS; was executed prior to each test, to prevent the source table being accessed from the buffer pool. OLE DB is used for both the source and destination components, and the underlying database is set to the simple recovery model. For each test, the package was executed via the SSIS catalog using Basic Logging. The average time recorded by the catalog from 3 successive runs was taken. The test machine is a 64-bit i5-6200 dual-core Dell laptop, running SQL Server 2014 SP1.
To begin with, the effect of using the Fast Load option on the OLE DB destination is measured:
Enabling it allows the OLE DB provider to make use of SQL Server’s BULK INSERT mechanism.
Test cases: Not using Fast Load is compared with using Fast Load, where in the latter case all Boolean Fast Load options are disabled and the remainder are left as their defaults.
Results:
Without Fast Load | 998 secs |
With Fast Load | 25 secs |
So the dramatic effect of the bulk insert can clearly be seen here.
Aside from allowing for concurrent loads on the same table through the use of BU locks and avoiding the overhead of individual row level locks, TABLOCK can provide for minimally logged inserts when used on databases running with the simple recovery model.
Test cases: This test compares the Fast Load from the last test, with and without the TABLOCK option.
Results:
Without TABLOCK | 25 secs |
With TABLOCK | 14 secs |
We can see that the use of TABLOCK resulted in a healthy 44% performance improvement here.
Test cases: This test uses the same destination as previously, but now with a clustered index added. TABLOCK is off, and the result is compared to the previous heap load where TABLOCK was also off.
Results:
Heap | 25 secs |
Clustered Index | 28 secs |
So the clustered index (B-Tree) insert can be seen to be a little slower here.
Trace Flag 610 can provide for minimal logging on a clustered index.
Test cases: This test re-runs the same cluster load as previously, but this time with Trace Flag 610 set. The result is compared to the previous cluster load.
Results:
Without TF 610 | 28 secs |
With TF 610 | 23 secs |
So there’s some improvement here, but it still doesn’t compare with the fastest heap load.
Test cases: This test re-runs the same cluster load using TF 610 as previously, but this time informing the bulk loader that the source is already sorted on the cluster key (which it is for this test), by using the FastLoadOptions property of the OLE DB Destination:
Results:
Without sort | 23 secs |
With sort | 20 secs |
This tweak results in a modest improvement to the load time.
The Fast Load option for the OLE DB destination has a parameter called Maximum Insert Commit Size (MICS), which defaults to 2GB. When MICS is larger than the SSIS buffer size (default 10MB) then there is 1 commit per buffer. This has been the case for all tests so far.
Test cases: This test re-runs the best heap and clustered index loads, but with MICS = 0. This results in 1 commit for the entire load, instead of 1 per buffer.
Results:
Heap (with TABLOCK) | 11 secs |
Clustered Index (with TF 610 and ORDER) | 11 secs |
Whilst both loads are faster, the cluster load was affected the most, with a 45% improvement.
All of the options used made some difference in the test environment – some being very significant indeed. The fact that they are options is important though – pros and cons exist for practically all bulk load settings, relative to the particular environment in which they are being used.
It must be emphasised also that an SSIS data load never exists in isolation. Decisions such as whether to run a database with the simple recovery model or whether Trace Flag 610 can be set globally, can only be made within the context of the system as a whole.
Finally, the above tests re-enforce the fact that benchmarking is key when developing an ETL package. Optimal performance can only be found by benchmarking candidate solutions within the context of the target environment.