There have been a number of articles about restartability in SSIS, particularly with regards how checkpoints and transactions are not always an effective solution.
Jamie Thomson’s article here is one example, and Matt Masson’s here is another. As a consequence, a number of approaches have been devised that avoid or limit the use of both.
Matt Masson summarises the situation well: “Instead of using transactions, consider designing your packages for restartability. This doesn’t necessarily mean the use of Checkpoints (although you can use them if you are so inclined) – it means that your packages take any corrective actions they need to ensure that they can be safely run multiple times.”
So this is a key goal for robust package design – if a package fails, then nobody wants to be untangling incomplete ETLs in order to get the database(s) and file system(s) back into a state where the package can run again. And where ideally, any work that was completed successfully doesn’t have to be repeated, since the package can pick up again from where it left off.
In this post I work through a restartability solution described by Tim Mitchell here. Tim’s a Microsoft MVP and co-author of the SSIS Design Patterns book.
I created this SSIS package to test the approach, which loads from a data file into a staging table, archives the data file, does a set-based update on the table, and then loads it into a production table:
The high-level steps required are described by Tim as follows:
- Create a database to store package restartability metadata
- Create and populate a table to store the names of the tasks to be included in the restartability scheme
- Create a table to store the restartability marker
- Build our package(s) to check this table for restart markers. If such a marker exists for the given package, start at that point rather than at the beginning of the package
- Build our package(s) to write to the restart marker table in the event of a failure
I work through these steps below.
A new database isn’t required if a suitable one already exists, but I created a new one called PackageController.
(
PackageTaskID INT,
PackageName VARCHAR(100),
TaskName VARCHAR(100)
)
(1, ‘RestartDemo’, ‘DFT Load Data File to Staging’),
(2, ‘RestartDemo’, ‘FSYS Archive Data File’),
(3, ‘RestartDemo’, ‘SQL Transform Staging’),
(4, ‘RestartDemo’, ‘DFT Load Production from Staging’)
(
PackageTaskID INT,
ActivityDate DATETIME2 DEFAULT SYSDATETIME()
)
I then updated my package to include the restartability logic:
FROM PackageTasks pt
INNER JOIN packageRestartPoint rp
ON pt.PackageTaskID = rp.PackageTaskID
WHERE pt.PackageName = ?;
TRUNCATE TABLE packageRestartPoint;
The first parameter here is an out parameter that uses a user-defined variable called User::StartTaskName to store any previously failed task name for restarting from, and the second parameter receives the System::PackageName system variable in order to identify the current package (the restartability table can span multiple packages).
I added the TRUNCATE TABLE statement to the end – this wasn’t mentioned in Tim’s post, but without it prior restart points get remembered when they’re no longer required.
The constraints leading out from this new task contain expressions in addition to requiring the Execute SQL Task to succeed. For the first task it’s simply LEN(@[User::StartTaskName]) == 0 which means that if no restart point is found then the control flow will start from the top. Subsequent expressions are of the form @[User::StartTaskName] == “DFT Load Data File to Staging” and which intuitively start the control flow at the point where it last failed.
SELECT PackageTaskID
FROM PackageTasks
WHERE PackageName = ?
AND TaskName = ?;
The parameters passed in are System::PackageName and System::SourceName. So upon any package failure, the restart table gets updated with details of the task that failed.
After checking the package still worked without any restart points, I modified the File System Task so that it would fail. So this gives us:
I then corrected the File System Task and ran the package again:
So we can see that the package did restart correctly from where it left off.
I think a key strength of this approach is its simplicity – it’s easy to understand, set up, wire in, debug and test. It does clearly require modifications to the target package – if this is a problem in some environments (e.g. large scale situations with many complex packages) then restartability may be better achieved by one of the frameworks that provide it in a more transparent way.
The above can easily be extended to include restart logging – it will almost certainly be useful to have a history of restart events (and the errors that caused them, if not logged elsewhere). This could be done by extending the use of the PackageRestartPoint table.
Overall it’s a solid approach that has little overhead if designed into packages from the beginning.