Monday, 14 March 2016

Merging Data into a SharePoint list using SSIS

Applies To: SharePoint, SSIS, SQL

I was recently tasked with creating a one way sync between some data in SQL and a SharePoint list. This can be done through an External Content Type and External Content List and works pretty well. You can find instructions for setting this up all over Google. I found some and set it up pretty quickly. Unfortunately, it wasn’t quite what they or I wanted. The interface was kinda clunky and caused some complications in terms of using the list as a lookup (specifically in a site column).
So, I did some more searching and found the SSIS SharePoint List Source and Destination Adapters available on Codeplex. The install is super simple (Needs to be done on both your development machine and the SSIS Server) and following the readme included you can get up to speed pretty quick. Assuming you have Business Intelligence Studio installed, it’s a simple matter of right-clicking on the toolbox and selecting Choose Items. In the SSIS Data Flow Components tab scroll down and add the SharePoint Source and Destination components (again more details can be found in the readme).
So… How do you make these work? I’m certainly no expert, but I have gotten them to work and it’s solved my problem. I ran into some issues that you can hopefully avoid by following my small example.
My goal was to take a table from SQL (could also be a view or a simple SELECT statement)and to have that be written to a SharePoint list on a daily basis. However, I am using the SharePoint list for lookups. This means that I want to be able to add any new rows, update existing rows and delete missing rows without changing the SharePoint IDs. This can be done, but it’s not as straightforward as you might hope.

Configure SQL Select

I won’t be going into a lot of detail about some of the Integration Services Package configuration and will skip some of those details assuming you either have the experience, problem solving ability, or at least know how to get to Google. However, I will explain each component and why I added it though.
All of this takes place in a Data Flow task (switch to that tab and click create data flow task) and the first object to drag from your toolbox is an OLE DB Source. Double-click it to open the properties and go ahead and configure a new Connection Managerand Data Connection to get to your database. (I am using the OLE DB Source since I’m connecting to SQL for this example, but you could literally choose any of the available data sources – including the SharePoint List Source). Here is the structure of the table I will be using:
I not doing anything fancy like specifying a WHERE clause (Switch your Data access mode to SQL Command to do that) and I’m pulling back both of my columns (Switch to the column tab to specify only specific columns if you don’t want the default of ALL):

Configure SharePoint Source

Since I don’t just want to dump data to a SharePoint list, but want to update existing records and add new ones, I need to retrieve the current contents of the list from SharePoint. The list I am trying to merge with is pretty simple:
You can pull SharePoint items using a SharePoint List Source object. Just drag one from the toolbox. Highlight the new object (click on it) and scroll down in the properties window to the SiteListName property. You’ll need to type in the full address in the SiteUrl and the display name of the list in the SiteListName. The SiteListViewName is optional (The default view is used if this isn’t specified). Also, be sure to set UseConnectManager to False. Here are my settings:
Now when you double-click your new SharePoint List Source you’ll have a Column Mappings tab where you can select the important fields to pull back from SharePoint(If you don’t have this tab, hit the Refresh button and verify your settings from above). Since all we care about is comparing the key fields, switch to the Input and Output Properties tab and select Remove Column for every column in the Ouput Columns except the ID and the SomeID columns (We need the SharePoint ID and the SQL ID). Your Column Mappings will now look like this:
 

Data Conversion

In order to determine which list items already existed, which are new and which ones to delete we need a common ID. The SharePoint ID will be used pretty extensively, but it’s the ID column from the SQL list, SomeID, that we are most interested in. We are going to join the results from SQL and the results from SharePoint into one dataset and then split them back out to do the deletions, additions, and updates needed.
To join them we’ll be using the SQL Identity column, SomeTableID. However, in order to join them correctly we’ll need them to be of the same datatype. Standard INT columns from SQL have their datatype set to four-byte signed integer [DT_I4] (You can find out the datatype by right-clicking the OLE DB Source and choosing Advanced Editor and checking the properties of the Output Columns). We used a number type column in SharePoint, which translates to a double-precision float [DT_R8].
To make things easier, switch to the Input and Output Properties tab of the SharePoint List Source editor window and choose the SomeID column in the Output Columns. Change the name of the Output Column from SomeID to SomeIDAsFloat:
Now add a Data Conversion object from the toolbox and connect the green output from the SharePoint List Source to it. Double-click it and check the box next to the SomeIDAsFloat column and switch the Output Alias to SomeID. Then match the Data Type to the SQL SomeTableID datatype (in our case it’s a four-byte signed integer [DT_I4]):

Required Sorts

In order to use a Merge object, both sources need to have their datasets sorted on the same column. So drag 2 Sort objects and connect the Data Conversion’s green output to one and the OLE DB Source’s green output to another.
In the sort for the SharePoint dataset, uncheck the pass through box for SomeIDAsFloat and check the sort box for the SomeID column:
In the SQL Sort check the box next to the ID column (in this case SomeTableID):

Merge Join

Now we’re ready to join the two datasets together. Drag a Merge Join object from the toolbox and connect the two sort’s green outputs too it. For this merge, it doesn’t matter which sort is the left or right join since we’re going to do a full join outer join.
Double-click the Merge Join object and set the Join type to Full outer join. Check the box next to all the columns from the SQL dataset and the box next to the SharePoint ID in the SharePoint dataset. Ensure the Join Key is set properly (In our case it’s the SomeID from SharePoint joined to the SomeTableID from SQL):

Split

Now that we have one big dataset, we can figure out which rows need to be added, deleted or updated. We can do all of this with a single Conditional Split object. Grab one and drag it from the toolbox and connect the Merge Join’s green output to it.
We know that any row where the SharePoint ID (SomeID) is NULL is a new record that needs to be added to SharePoint. Similarly we know that any row where a value from SQL (SomeColumn) besides the SQL ID is NULL is a record that no longer exists in SQL and should therefore be deleted from SharePoint. If it doesn’t fall into either category above, it is a record that was already added to SharePoint and simply needs to be updated.
Using that logic we create 3 outputs. The first output (Just click in the Output Name column) should be named New Records and the Condition set to ISNULL([ID]). The second output should be named Deleted Records and it’s condition set to ISNULL([SomeColumn]). Finally switch the default output name to Existing Records:

Add To SharePoint

Finally we are ready for our first SharePoint List Destination object. Grab one from the toolbox and drag it over. Connect the green arrow from the Conditional Split object and choose New Records in the Output dropdown. Set the SiteListName,SiteUrl and UseConnectionManager properties to the same as the SharePoint List Source object. Double-click the new SharePoint List Destination object and switch to the Column Mappings tab (If you don’t have this hit the Refresh Button and verify your settings).
With SharePoint List Destination objects, the default action is Modification. This can be used to either add records or update records. This is all dependent on you setting the ID column in the Column Mappings tab. If the ID is mapped then it is automatically an update, if not then it’s an insert.
For new records we want insert, so select <ignore> for the Input Column mapped toID then map everything else accordingly. Especially important is mapping the SQL ID to the column storing that in SharePoint:

Update SharePoint

We’ll now do the exact same as above with the twist that we will be mapping the ID column. So add another SharePoint List Destination object and drag another green connection from the Conditional Split. This time pick the Existing Records in the Output dropdown. Set the SiteListNameSiteUrl and UseConnectionManagerproperties the same as before and double-click to open the advanced editor.
This time in the Column Mappings (again you might have to hit refresh if you’re missing this tab) map everything the same as our Record Addition output but keep ID mapped to ID:

Delete From SharePoint

To remove records from SharePoint, drag another SharePoint List Destinationobject and connect the Conditional Split’s green output arrow to it. You won’t have to select the output since Deleted Records is the only output left. Set theSiteListName, SiteUrl and UseConnectionManager properties the same as before. This time you will also need to change the Batch Type property from Modification to Deletion:
Double-click the Destination object and switch to the Column Mappings tab (again you might have to hit refresh if you’re missing this tab). The only thing you need to map here is the ID to ID:
That’s it. Attach this to a SQL job and schedule it at whatever frequency you need the thing to run and you’ve got a SharePoint list being synchronized with SQL through the power of SSIS. WOWEE!

No comments:

Post a Comment