Tuesday, 6 March 2018

How to Setup, Deploy, Execute & Schedule Packages Azure SSIS


Welcome back to work in 2018! πŸ™‚
Let’s get stuck in with a hot topic. How do we actually use our beloved SQL Server Integration Services (SSIS) packages in Azure with all this new platform as a service (PaaS) stuff? Well, in this post I’m going to go through it end to end.

Post Contents

First, some caveats:
  1. Several of the Azure components required for this are still in public preview and can be considered as ‘not finished’. Meaning this is going to seem a little painful.
  2. The ADFv2 developer UI is still in private preview. But I’ve cheated and used it to generate the JSON to help you guys. Hopefully it’ll be available publicly soon.
  3. I’ve casually used my Microsoft sponsored Azure subscription and not had to worry about the cost of these services. I advise you check with the bill payer.
  4. Everything below has been done in a deliberate order. Especially the service setup.
  5. Everything below has been deployed in the same Azure region to avoid any cross data centre authentication unpleasantness. I suggest doing the same. I used EastUS for this post.
Ok, moving on…

Azure Services Setup

Now, let’s set some expectations. To get our SSIS packages in Azure we new a collection of services. When working on premises this gets neatly wrapped up with a pretty bow into something called SQL Server. Sadly in Azure there is no wrapping, no pretty bow and nothing that neat. Yet!

Azure Data Factory Version 2 (ADFv2)

First up, my friend Azure Data Factory. As you’ll probably already know, now in version 2 it has the ability to create recursive schedules and house the thing we need to execute our SSIS packages called the Integration Runtime (IR). Without ADF we don’t get the IR and can’t execute the SSIS packages. My hope would be that the IR would be a stand alone service, but for now its contained within ADF.
To deploy the service we can simply use the Azure portal blades. Whatever location you choose here make sure you use the same location for everything that follows. Just for ease. Also, it might be worth looking ahead to ensure everything you want is actually available in your preferred Azure region.
Lets park that service and move on.

Azure SQL Server Instance

Next, we need a logical SQL Server instance to house the SSIS database. Typically you deploy one of these when you create a normal Azure SQLDB (without realising), but they can be created on there own without any databases attached. To be clear, this is not an Azure SQL Server Managed Instance. It does not have a SQL Agent and is just the endpoint we connect to and authenticate against with some SQL credentials.
Again to deploy the service we can simply use the Azure portal blades. On this one make sure the box is checked to ‘Allow azure services to access server’ highlighted with the orange arrow below and of course make a note of the user name and password. If you don’t check the box ADF will not be able to create the SSISDB in the logical instance later on.
Once the SQL instance service is deployed. Go into the service blades and update the firewall rules to allow access from your current external IP address. This isn’t anything specifically required for SSIS, you need to do it for any SQLDB connections. Which is something that I always forget, so I’m telling you to help me remember! Thanks.

Azure SSIS IR

Next on the list, we need the shiny new thing, the SSIS IR, it needs creating and then starting up. In my opinion this is a copy of the SQL Server MsDtsSrvr.exe taken from the on premises product and used in the cloud on a VM that we don’t get access to… Under the covers it probably is, but I’m guessing.
Sadly for this we don’t have any nice Azure portal user interface for this yet. It’s going to need some PowerShell. Make sure you have your Azure modules up to date and run the following with the top set of variables assigned as required.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Azure Data Factory version 2 information:
$SubscriptionId = ""
$ResourceGroupName = ""
$DataFactoryName = "" 
$DataFactoryLocation = ""
 
# Azure-SSIS integration runtime information:
$AzureSSISName = ""
$AzureSSISDescription = ""
 
$AzureSSISNodeSize = "Standard_A4_v2"
$AzureSSISNodeNumber = 2 
$AzureSSISMaxParallelExecutionsPerNode = 2 
$SSISDBPricingTier = "S1" 
 
# Azure Logical SQL instance information:
$SSISDBServerEndpoint = ".database.windows.net"
$SSISDBServerAdminUserName = ""
$SSISDBServerAdminPassword = ""
 
 
<# LEAVE EVERYTHING ELSE BELOW UNCHANGED #>
 
$SSISDBConnectionString = "Data Source=" + $SSISDBServerEndpoint + ";User ID="+ $SSISDBServerAdminUserName +";Password="+ $SSISDBServerAdminPassword
$sqlConnection = New-Object System.Data.SqlClient.SqlConnection $SSISDBConnectionString;
Try
{
    $sqlConnection.Open();
}
Catch [System.Data.SqlClient.SqlException]
{
    Write-Warning "Cannot connect to your Azure SQL DB logical server/Azure SQL MI server, exception: $_"  ;
    Write-Warning "Please make sure the server you specified has already been created. Do you want to proceed? [Y/N]"
    $yn = Read-Host
    if(!($yn -ieq "Y"))
    {
        Return;
    } 
}
 
Login-AzureRmAccount
Select-AzureRmSubscription -SubscriptionId $SubscriptionId
 
Set-AzureRmDataFactoryV2 -ResourceGroupName $ResourceGroupName `
                        -Location $DataFactoryLocation `
                        -Name $DataFactoryName
 
$secpasswd = ConvertTo-SecureString $SSISDBServerAdminPassword -AsPlainText -Force
$serverCreds = New-Object System.Management.Automation.PSCredential($SSISDBServerAdminUserName, $secpasswd)
Set-AzureRmDataFactoryV2IntegrationRuntime  -ResourceGroupName $ResourceGroupName `
                                            -DataFactoryName $DataFactoryName `
                                            -Name $AzureSSISName `
                                            -Type Managed `
                                            -CatalogServerEndpoint $SSISDBServerEndpoint `
                                            -CatalogAdminCredential $serverCreds `
                                            -CatalogPricingTier $SSISDBPricingTier `
                                            -Description $AzureSSISDescription `
                                            -Location $DataFactoryLocation `
                                            -NodeSize $AzureSSISNodeSize `
                                            -NodeCount $AzureSSISNodeNumber `
                                            -MaxParallelExecutionsPerNode $AzureSSISMaxParallelExecutionsPerNode
 
write-host("##### Starting your Azure-SSIS integration runtime. This takes 20 to 30 minutes to complete. #####")
Start-AzureRmDataFactoryV2IntegrationRuntime -ResourceGroupName $ResourceGroupName `
                                             -DataFactoryName $DataFactoryName `
                                             -Name $AzureSSISName `
                                             -Force
 
write-host("##### Completed #####")
write-host("If any cmdlet is unsuccessful, please consider using -Debug option for diagnostics.")
I confess I’ve stolen this from Microsoft in there documentation here and tweaked it slightly to use the more precise subscription ID parameter as well as a couple of other things that I felt made life easier. While this is running you should get a process bar from the PowerShell ISE for the SSIS IR service starting, which really does take around 30mins. Be patient.
If you’d prefer to do this through the ADF PowerShell deployment cmdlets here is the JSON to use. Again assign values to the attributes as required. The JSON will create the SSIS IR, but it won’t start it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"name": "",
"properties": {
 "type": "Managed",
 "description": "",
 "typeProperties": {
  "computeProperties": {
   "location": "EastUS",
   "nodeSize": "Standard_A4_v2",
   "numberOfNodes": 2,
   "maxParallelExecutionsPerNode": 2
  },
  "ssisProperties": {
   "catalogInfo": {
    "catalogServerEndpoint": "Your Instance.database.windows.net",
    "catalogAdminUserName": "user",
    "catalogAdminPassword": {
     "type": "SecureString",
     "value": "password"
    },
    "catalogPricingTier": "S1"
}}}}}
For info. The new developer UI gives you a wizard to go through the steps and a nice screen to see that the IR now exists. Until you get public access to this you’ll just have to assume its there.

Anyway, moving on. Once the thing has deployed and started you’ll now have an SSIS IR and also in your logical SQL instance the SSISDB. Exciting!
Open SSMS, making sure your are using version 17.2 or later. In the connection dialogue box as well as the usual bits go to options and explicitly set which database your connecting to. If you don’t the Integration Services branch won’t appear in SSMS Object Explorer. You’ll see the database tables, views, stored procs, but won’t have any of the SSIS options to control packages.
If all goes well you should get a very familiar sight…

Creating & Deploying an SSIS Package

As this is a ‘how do’ guide I’ve done something very simple in my package. It basically copies and pastes a CSV file from one Azure Data Lake Storage (ADLs) folder to another. I’m going to assume we are all familiar with more complex SSIS packages. Plus, the point of this post was getting the services working, not to do any data transformations.

SSIS Azure Feature Pack

What is probably worth pointing out is that if you want to work with Azure services in SSIS SQL Server Data Tools you need to install the Azure Feature Pack. Download and install it from the below link:
https://docs.microsoft.com/en-us/sql/integration-services/azure-feature-pack-for-integration-services-ssis
Once installed in your SSIS Toolbox (Control Flow/Data Flow) and Connection Manager you’ll have Azure services available.
For info, the Azure Data Lake Storage connection manager now offers the option to use a service principal to authenticate.

Package Deployment

Now I’m not going to teach a granny to suck eggs (or whatever the phrase is). To deploy the package you don’t need to do anything special. I simply created the ISPAC file in SSDT and used the project deployment wizard in SSMS. The deployment wizard from the project didn’t work in my version of SSDT running in Visual Studio 2015. Not sure why at this point so I used SSMS.

Package Execution

Similarly I’m going to assume we all know how to execution an SSIS package from management studio. It’s basically the same menu on the right where the deployment wizard gets launched. Granny, eggs, etc.
Or, we can execute a couple of stored procedures using some good old fashioned T-SQL (remember that?). See below.
 
1
2
3
4
5
6
7
8
9
10
11
DECLARE @execution_id bigint;  
 
EXEC [SSISDB].[catalog].[create_execution] 
 @package_name=N'DataLakeCopy.dtsx', 
 @execution_id=@execution_id OUTPUT,
 @folder_name=N'Testing',
 @project_name=N'AzureSSIS',
 @use32bitruntime=False; 
 
EXEC [SSISDB].[catalog].[start_execution] 
 @execution_id;
I mention this because we’ll need it when we schedule the package in ADF later.

Scheduling with ADFv2

Ok, now the fun part. Scheduling the package. Currently we don’t have a SQL Agent on our logical instance and we don’t have Elastic DTU Jobs (coming soon). Meaning we need to use ADF.
Thankfully in ADFv2 this does not involve provisioning time slices! Can I get a hallelujah? πŸ™‚
This is the part where I cheated and used the new developer UI, but I’ll share all the JSON in case you don’t have a template for these bits in ADFv2 yet.

Linked Service to SQLDB

To allow ADF to access and authenticate against our logical SQL instance we need a linked service. We did of course already provide this information when creating the SSIS IR. But ADF needs them again to store and call for activity executions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
    "name": "SSISDB",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "
Integrated Security=False;
Encrypt=True;
Connection Timeout=30;
Data Source=;
Initial Catalog=;
User ID="
            }
        }
    }
}

The Pipeline

Nothing extra here, a very very simple pipeline similar to what you’ve previously seem in ADFv1. Only without the time slice schedule values and other fluff.
1
2
3
4
5
6
{
    "name": "RunSSISPackage",
    "properties": {
        "activities": []
    }
}

Stored Procedure Activity

Next, the main bit of the instruction set, the activity. You’ll know from the T-SQL above that in the SSISDB you need to first create an instance of the execution for the SSIS package. Then pass the execution ID to the start execution stored procedure. ADF still can’t handle this directly with one activity giving its output to the second, meaning we have to wrap up the T-SQL we want into a parameter for the sp_executesql stored procedure. Everything can be solved with more abstraction, right? πŸ™‚
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
            {
                "name": "CreateExecution",
                "type": "SqlServerStoredProcedure",
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 20
                },
                "typeProperties": {
                    "storedProcedureName": "sp_executesql",
                    "storedProcedureParameters": {
                        "stmt": {
                            "value": "
Declare @execution_id bigint;  
EXEC [SSISDB].[catalog].[create_execution] 
@package_name=N'DataLakeCopy.dtsx', 
@execution_id=@execution_id OUTPUT,
@folder_name=N'Testing',
@project_name=N'AzureSSIS',
@use32bitruntime=False; 
 
EXEC [SSISDB].[catalog].[start_execution] 
@execution_id;"
                        }
                    }
                },
                "linkedServiceName": {
                    "referenceName": "SSISDB",
                    "type": "LinkedServiceReference"
                }
            }

Scheduled Trigger

Last but not least our scheduled trigger. Very similar to what we get in the SQL Agent, but now called ADF! For this post I went for 1:30pm daily as a test.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
    "name": "Daily",
    "properties": {
        "runtimeState": "Stopped", //change to Started
        "pipelines": [
            {
                "pipelineReference": {
                    "referenceName": "RunSSISPackage",
                    "type": "PipelineReference"
                },
                "parameters": {}
            }
        ],
        "type": "ScheduleTrigger",
        "typeProperties": {
            "recurrence": {
                "frequency": "Day",
                "interval": 1,
                "startTime": "2018-01-05T13:23:16.395Z",
                "timeZone": "UTC",
                "schedule": {
                    "minutes": [
                        30
                    ],
                    "hours": [
                        13
                    ]
                }
            }
        }
    }
}
The new UI gives you a nice agent style screen to create more complex schedules, even allowing triggers every minute if you wish. Here’s a teaser screen shot:
I hope this gave you the an end to end look at how to get your SSIS packages running in Azure and saved you looking through 10 different bits of Microsoft documentation.
Many thanks for reading

No comments:

Post a Comment