Source control plays an essential part of any software development project, where ETL is no exception.
At the very least it preserves source code such that recovery is possible to a previous point in time, whilst also allowing developers to work on multiple streams of the code such that new developments are isolated from release builds. Importantly it also provides an audit trail for the project, documenting who changed what, when and why. It also supports collaboration, where checked-in software will usually be stored on a server that is available to all developers and to key project stakeholders. This server will also be backed up safely, which provides another level of backup over and above that taking place on developers’ own machines.
This post reviews the use of Subversion (SVN) with SSIS, through the use of the VisualSVN SVN server and Visual Studio Plug-in available at http://www.visualsvn.com. The VisualSVN server has a Standard Edition that’s free of charge and available for commercial use, and the plug-in has a Community Edition that’s also free (i.e. for computers that don’t belong to an active directory domain, else it’s $99 per developer).
I downloaded v3.5.3 of the SVN server, which had a simple installation package that ran quickly and with no problems. I followed this with the download of v5.1.4 of the Visual Studio plug-in. This too was quick and simple, but ended with a warning saying that TortioseSVN should also be installed. (The VisualSVN website says “VisualSVN uses TortoiseSVN for most of the dialogs”.) I downloaded v1.9.4 of TortoiseSVN from here and installed that with no problems.
This is done by opening SSDT (Visual Studio 2013 on my machine) and setting the source control plug-in to VisualSVN:
This results in a (long!) new Visual Studio menu called VISUALSVN, together with SVN options being added to Solution Explorer context menus. A Pending Changes window also appears, that keeps track of any files changed in the local working copy and that are candidates for subsequent commits.
The SVN server has a management application called VisualSVN Server Manager, which provides for creating, updating, deleting & viewing repositories, users and user groups. Intuitive context menus allows these to be created:
This is done by creating or opening a project in SSIS as usual, and then right-clicking the solution name in the Solution Explorer and selecting “Add Solution to Subversion…”. After confirming the solution path, the solution can be added to an existing repository by supplying its URL.
When dealing with individual files, common development life cycle tasks are available via the Solution Explorer context menu:
Update loads the latest version of a file from the repository, Show Differences shows changes made to a file since it was last updated (and requires familiarity with SSIS’s XML file format!), Revert Changes overwrites the local copy with a prior version from the repository, and Commit commits local changes back to the repository.
Commit allows a comment to be entered against the commit, which is essential for maintaining a proper audit trail of the history of a project.
To avoid accidental concurrent updates to the same SSIS package, developers can use SVN’s Get Lock option at the start of their edits. This prevents any other developer from being able to lock the same file for editing. The lock can be released upon the next commit of the package, or held if ongoing edits are required.
Continuing from the last point, if there are multiple developers working on a single ETL solution, then the exclusive unit of work per developer should be a package (or else a whole project where applicable). This has the side effect of helping to keep SSIS packages small and focused, which has many benefits, not least aiding the ease of understanding of large ETL projects and thus lowering the risk of their ongoing development and maintenance.
The main VisualSVN menu has the necessary Branch, Switch and Merge commands:
to allow developers to work on updates in their own branch, away from test or release builds.
The exact use of source control can vary considerably, for example subject to the size of the development team. In the context of SSIS, I think that Phil Austin nails overall best use of it in his blog post here.
Highlighting 3 points from that post: (1) structure the local directories and repository sensibly, (2) branch off for any new developments and (3) take great care in how development streams are merged back to the trunk.
Consistent with Phil’s post is this illustration from Microsoft’s excellent Channel 9 presentation Deep Inside the Microsoft SQL Server Integration Services Server:
This workflow was discussed during the presentation in the context of why the SSIS Project Deployment model insists on whole projects being deployed rather individual packages. Matt Masson discusses the workflow in one of his blog posts where he says:
- Developers are responsible for their own testing on the development branch
- Merges back into the integration branch allow for further testing to find anything missed by developers
- Once QA have signed off the integration build, then it can be merged into the release branch, and where tags are used to identify specific versions
In summary, VisualSVN is a simple tool to install, configure and use, and can be used to support best practice SDLC workflow for SSIS.