Finding the Column Name of an SSIS Error Output Error Column ID

Andrew Smith Posted on

Many SSIS data flow components have error outputs for optionally directing rows through, e.g. for handling data type conversion errors.

These error outputs include the data columns being handled by the component, as well as 2 additional columns – an ErrorCode column and an ErrorColumn column.

Whilst the ErrorCode column can be translated to an error description by using a script component containing a single line that calls the IDTSComponentMetaData100.GetErrorDescription() method from the Microsoft.SqlServer.Dts.Pipeline.Wrapper namespace (see here for a reference to this), there’s been no way to do an equivalent API lookup for the ErrorColumn column, which contains an integer lineage ID representing the problem column.

This has been resolved for SSIS in SQL Server 2016, with the introduction of:

  • the IDTSComponentMetaData130.GetIdentificationStringByID() method for script component ErrorColumn lookups
  • an extra “ErrorColumn – Description” column available in data viewers enabled on error outputs
  • logging of a LineageID-to-column-name map, via SSIS log providers and SSIS catalog logging

See here for more details of the above in a Microsoft blog post describing these improvements for SQL Server 2016.

Whilst a number of creative solutions for finding the column name for an ErrorColumn ID pre-2016 have been documented, the one by Todd McDermid that I implement in this post is possibly the best out there. Todd describes it in detail here and comments that it: “decodes the ErrorColumn value that’s on the Error output with in-the-box stuff only. Without add-on tools. Without custom components. Without pre-populated lookup tables made by utilities.”

My implementation of his technique within a data flow for SSIS in SQL Server 2014 is:

ErrorColumnDataFlow

The essence of it is a flat file that’s being loaded into an Employees database table, and where rows being processed by the flat file source component may result in errors due to data quality problems within the flat file.

I introduced one error into the “LastName” column of a row in the flat file, and another error into the “BaseRate” column of a row in the flat file, and ran the data flow. The resulting contents of sample columns from the error table populated by the Employee Errors OLE DB destination right at the bottom of the data flow is:

ErrorColumnErrorTable

So the names of the offending columns can clearly be seen here.

The core of Todd’s pattern is the script component in the middle of the data flow above. It’s an asynchronous component that contains this single method:

ErrorColumnCSharp

Todd describes it in detail in his post so I won’t repeat it here, but in summary it creates a map from column lineage IDs to column names, which is a process that needs to take place on the regular output of the transform that’s creating errors and not on the error output.

This map is then merge-joined with the ErrorColumn lineage ID(s) coming down the error path, so that the error information can be appended with the column name(s) from the map. I included a second script component that looks up the error description from the error code, so the error table rows that we see above contain both column names and error descriptions.

The remaining component that needs explaining is the conditional split – this exists just to provide metadata to the script component that creates the map. I created an expression (1 == 0) that always evaluates to false for the “No Rows – Metadata Only” path, so no rows ever travel down it.

Whilst this solution does require the insertion of some additional plumbing within the data flow, we get extremely valuable information logged when errors do occur. So especially when the data flow is running unattended in Production – when we don’t have the tools & techniques available at design time to figure out what’s going wrong – the logging that results gives us much more precise information about what went wrong and why, compared to simply giving us the failed data and leaving us to figure out why it was rejected.

And as Todd points out, knowing the offending column name at runtime gives us the possibility of better proactive error handling within the data flow at runtime.