Wednesday, 3 February 2016

Working with SSIS Data Types


In order to be able to take data from a variety of sources, manipulate it, and then export it to one of several destinations, SSIS has to use its own data types. Generally, the conversions of data types is handled implicitly, but if you hit problems, then you may need to intervene in one of several ways to ensure an appropriate conversion.
Most SQL Server Integration Services (SSIS) packages are created to extract data from one or more sources, transform that data, and load it into one or more destinations. Throughout this process, SSIS uses its own set of data types to move, manage, and manipulate the data before loading it into its final destination. These data types are specific to SSIS and are not the same as those supported by the SQL Server database engine or by other database systems.
Although SSIS data types are different from the SQL Server ones, many of the SSIS types correspond directly to the Transact-SQL types. However, it’s not a one-to-one mapping. SSIS also includes data types that support other database systems such as Jet, DB2, and Oracle. Plus SSIS types support data from Excel spreadsheets, comma-separated values (CSV) files, text files, directory services, and other sources.
Because SSIS types are independent from the other systems, each SSIS type can map to a variety of types in those systems. For example, SSIS does not include geospatial data types such as those you find in SQL Server. Instead, SSIS uses an image type that is specifically mapped to the geospatial types. Yet the image type is not limited to those geospatial types. It is also mapped to other data types in the various systems.
The data types supported by SSIS can be divided into the following categories:
  • Numeric: Types that support numeric values formatted as currencies, decimals, and signed and unsigned integers. SSIS supports more numeric types than any other kind.
  • String: Types that support ANSI and Unicode character strings.
  • Date/Time: Types that support date values, time values, or both in various formats.
  • Binary: Types that support binary and image values.
  • Boolean: A type to handle Boolean values.
  • Identifier: A type to handle globally unique identifiers (GUIDs).
A complete list of the SSIS types and an explanation of each one are beyond the scope of this article, but you can find details about all the SSIS types in the topic “Integration Services Data Types” in SQL Server Books Online. In the meantime, let’s look at some examples of data types being used in an SSIS package to better understand how they work. The examples retrieve data from the Production.Product table in the AdventureWorks2012 sample database (on a local instance of SQL Server 2012) and insert data into the dbo.AWProducts table, which I created in the same database. If you want to try out these examples yourself, you’ll first need to run the following Transact-SQL code to create the AWProducts table:
USE AdventureWorks2012;GO
IF OBJECT_ID('dbo.AWProducts'IS NOT NULLDROP TABLE dbo.AWProducts;GO
CREATE TABLE dbo.AWProducts(
  
NewProdID NVARCHAR(40) NOT NULL PRIMARY KEY,
  
OldProdID INT NOT NULL,
  
ProdName NVARCHAR(50) NOT NULL,
  
MakeFlag BIT NOT NULL,
  
ReorderPoint SMALLINT NOT NULL,
  
ProdPrice MONEY NOT NULL,
  
Weight DECIMAL(8,2) NULL,
  
ProductLine NCHAR(2) NULL,
  
SellStartDate DATETIME NOT NULL,
  
EndDate DATE NULL,
  
OldProdGUID UNIQUEIDENTIFIER NOT NULL
);
The package I use to demonstrate the SSIS types is very basic and includes only the following components:
  • An OLE DB connection manager to retrieve data from and insert data into the AdventureWorks2012database.
  • Data Flow task that contains the components necessary to extract, transform, and load the product data.
  • An OLE DB source that retrieves data from the Production.Product table in the AdventureWorks2012database. The source uses the OLE DB connection manager to connect to the database.
  • Data Conversion transformation that converts two columns in the data flow.
  • Derived Column transformation that creates a column based on concatenated data from columns in the data flow.
  • An OLE DB destination that inserts data into the dbo.AWProducts table. The destination uses the OLE DBconnection manager to connect to the database.
As we progress through the article, we’ll look at the package’s components in more detail in order to examine how the SSIS data types are being used. Note, however, that the components, for the most part, use the default configuration settings, so it is fairly straightforward to set them up. Figure 1 shows what the data flow should look like once you’ve added and configured all your components.
Setting up your SSIS data flow
Figure 1: Setting up your SSIS data flow
As you can see, there’s a single connection manager, which I’ve named AdventureWorks2012, and four data flow components, all included in a single Data Flow task. Now let’s look at these components in more detail so we can better understand how SSIS data types are used in each of them.

Implicit Conversions from Source Data

When you retrieve data from a data source, the data enters the package’s data flow, at which time it is implicitly converted into SSIS types. Those conversions are defined in a set of XML data type mapping files that are located on the file system where SQL Server and SSIS are installed. In SQL Server 2012, the files are located by default in the C:\Program Files\Microsoft SQL Server\110\DTS\MappingFiles folder. It is well worth reviewing these files to get a sense of how data types are being mapped in order to facilitate data flowing in and out of an SSIS package.
Each mapping file includes a set of XML elements that define the specific mappings between data source types and SSIS types. For example, the XML file that maps SQL Server data types to SSIS data types is MSSQLToSSIS10.XML. The file contains a set of mappings that each map a SQL Server type to an SSIS type. One of these mappings is for the SQL Server XML data type, which is mapped to the SSIS DT_WSTR data type, as shown in the following XML fragment:
<!-- xml -->
<dtm:DataTypeMapping >
  <dtm:SourceDataType>
    <dtm:DataTypeName>xml</dtm:DataTypeName>
  </dtm:SourceDataType>
  <dtm:DestinationDataType>
    <dtm:CharacterStringType>
      <dtm:DataTypeName>DT_WSTR</dtm:DataTypeName>
      <dtm:UseSourceLength/>
    </dtm:CharacterStringType>
  </dtm:DestinationDataType>
</dtm:DataTypeMapping>
When an SSIS package retrieves data from a SQL Server database, it references the MSSQLToSSIS10.XMLmapping file. If the source data includes an XML column, it will use the conversion defined in this XML fragment to convert the XML column to the DT_WSTR type in SSIS, which is a string type that supports Unicode values.
Now let’s look at the OLE DB source shown in Figure 1. The component uses the AdventureWorks2012 connection manager to connect to the AdventureWorks2012 database. In addition, it uses the following SELECT statement to retrieve data from the Production.Product table:
SELECT
  
ProductID,
  
Name,
  
MakeFlag,
  
ReorderPoint,
  
ListPrice,
  
Weight,
  
ProductLine,
  
SellStartDate,
  
CONVERT(DATESellEndDate22AS EndDate,
  
rowguidFROM
  
Production.Product;
After you’ve set up your OLE DB source, you can verify how the SQL Server types will be converted to SSIS types by using the advanced editor associated with that component. You launch the editor by right-clicking the component and then clicking Show Advanced Editor. This opens the Advanced Editor for OLE DB Source dialog box. Go to the Input and Output Properties tab and, in the Inputs and Outputs box, navigate to the Output Columns node, as shown in Figure 2.
Properties for the ProductID column
Figure 2: Properties for the ProductID column
Expand the Output Columns node and select the ProductID column. The properties for that column appear in the grid to the right. These properties are specific to the SSIS data flow. As you can see in Figure 2, one of those properties is DataType, and its value is four-byte signed integer [DT_I4]. The name of the type is DT_I4, which is how it’s usually referenced in SSIS. However, in some cases, as in here, a description is also provided, which makes it handy to understand the exact nature of the type. Keep in mind, however, that the data is being retrieved from the ProductID column in the Production.Product table in SQL Server. That column is configured with the INT data type. That means SSIS is converting the type from the SQL Server INT type to the SSIS DT_I4type.
Now look at the properties in the Name column (shown in Figure 3). Notice that the DataType property now has a value of Unicode string [DT_WSTR]. In this case, the source data comes from the SQL Server Name column, which is configured with the NVARCHAR data type. Again, SSIS has implicitly converted the source data to the SSIS type.
Properties for the Name column
Figure 3: Properties for the Name column
In fact, each column in the data flow has been implicitly converted to an SSIS type. Table 1 shows the data type of the source data and the SSIS type in which it has been converted.
Column
SQL Server data type
SSIS data type
ProductID
INT
DT_I4 (four-byte signed integer)
Name
NVARCHAR(50)
DT_WSTR (Unicode string)
MakeFlag
BIT
DT_BOOL (Boolean)
ReorderPoint
SMALLINT
DT_I2 (two-byte signed integer)
ListPrice
MONEY
DT_CY (currency)
Weight
DECIMAL(8,2)
DT_NUMERIC (numeric)
ProductLine
NCHAR(2)
DT_WSTR (Unicode string)
SellStartDate
DATETIME
DT_DBTIMESTAMP (database timestamp)
EndDate
DATE
DT_DBDATE (database date)
rowguid
UNIQUEIDENTIFIER
DT_GUID (unique identifier)
Table 1: Implicit conversions from SQL Server to SSIS
Notice that each SQL Server type has been converted to an SSIS type and that those types correspond with each other in a way that permits the data values to pass seamlessly between the database and SSIS. If you were to refer to the MSSQLToSSIS10.XML mapping file, you would see that all the mappings have been defined there. Yet even as efficient as this system is, there might be some cases in which you’ll want to configure a column with a different SSIS data type, in which case you can perform an explicit conversion.

Explicit Conversions of SSIS Types

Two common ways to convert data in your data flow is to use the Data Conversion transformation to perform a simple conversion or to use the Derived Column transformation to create a column that is based on converted data. Let’s first look at the Data Conversion transformation.
In our example package, we’ve added a Data Conversion transformation to convert two columns: rowguid andListPrice. If you refer back to Table 1, you’ll see that the rowguid column, as it is rendered in the SSIS data flow, is configured with the DT_GUID data type, which is the only SSIS unique identifier type. For this example, you’ll convert the column to an ANSI string type (DT_STR) and change the column name to OldProdGUID. As for theListPrice column, it’s configured with the currency data type (DT_CY). You’ll convert that column to an eight-byte signed integer (DT_I8) and rename it ProdPrice. Figure 4 shows how your conversions should appear in the Data Conversion Transformation Editor.
Converting columns to different SSIS data types
Figure 4: Converting columns to different SSIS data types
Notice that when you convert the rowguid column to a string data type, you specify a length. Just as in SQL Server, you must specify a length for any of the character data types. That means you must ensure that the length is long enough to accommodate all the values in the table. Otherwise, SSIS will try to truncate the data. (By default, a truncation causes the transformation to generate an error, although you can override this behavior by redirecting your error output.)
As noted above, another way you can convert data is to you use the Derived Column transformation. In this case, you’re converting data as part of the expression you use to define the column. For example, suppose you want to concatenate the ProductID and Name columns. However, the ProductID column is configured with the DT_I4data type, and the Name column is configured with the DT_WSTR data type. To concatenate values from these two columns, you need to first convert the ProductID column to a string, as shown in the following expression:
(DT_WSTR,4)ProductID + SUBSTRING(Name,1,4)
To convert a column in this way, you first specify the new data type and length, in parentheses, and then the column name. You can then concatenate this value with the Name column. In this case, you’re using the SUBSTRING function to specify that only the first four letters of the name be used.
The column expression is one of the values you define when we use the Derived Column transformation to create a column. Figure 5 shows how to configure the other values.
Converting a column when creating a derived column
Figure 5: Converting a column when creating a derived column
As the figure shows, a new column named NewProdID is being added to the data flow. The column is configured with the Unicode string data type (DT_WSTR) and its length is set to 8.

Implicit Conversions to Destination Data

After you’ve transformed the data in the data flow, you’re ready to load it into your target destination. However, you might want to first verify the columns in your data flow. To do so, you can use the Data Flow Path Editor for the data flow path connecting the Derived Column transformation to the OLE DB destination. When you open the editor, go to the Metadata page, as shown in Figure 6.
Available columns in the SSIS data flow
Figure 6: Available columns in the SSIS data flow
As you can see, all columns and their data types are listed in the editor. That includes the two columns that have been converted, both before and after that conversion. Also included in the information is the value’s length, where applicable. If precision and scale are factors in any of your columns, those too are included.
Once you’re satisfied that the data looks correct, you can configure the destination. If you’re inserting data into a SQL Server database, there should be little problem, unless the data itself does not conform to the target type. For instance, an incorrectly formatted GUID might cause the destination component to generate an error when loading the data into a target column configured with the UNIQUEIDENTIFIER type.
Table 2 shows how the data types in the SSIS data flow correspond to the data types in the AWProducts table. As you can see, not all the SSIS types correlate directly to those of the target columns. (The ListPrice and rowguidcolumns are not inserted into the destination, only the converted columns: ProdPrice and OldProdGUID.)
SSIS Column
SSIS data type
Database column
SQL Server data type
ProductID
DT_I4 (four-byte signed integer)
OldProdID
INT
Name
DT_WSTR (Unicode string)
ProdName
NVARCHAR(50)
MakeFlag
DT_BOOL (Boolean)
MakeFlag
BIT
ReorderPoint
DT_I2 (two-byte signed integer)
ReorderPoint
SMALLINT
ListPrice
DT_CY (currency)


Weight
DT_NUMERIC (numeric)
Weight
DECIMAL(8,2)
ProductLine
DT_WSTR (Unicode string)
ProductLine
NCHAR(2)
SellStartDate
DT_DBTIMESTAMP (database timestamp)
SellStartDate
DATETIME
EndDate
DT_DBDATE (database date)
EndDate
DATE
rowguid
DT_GUID (unique identifier)


OldProdGUID
DT_STR (ANSI character string)
OldProdGUID
UNIQUEIDENTIFIER
ProdPrice
DT_I8 (Eight-byte signed integer)
ProdPrice
MONEY
NewProdID
DT_WSTR (Unicode string)
NewProdID
NVARCHAR(40)
Table 2: Implicit conversions from SSIS to SQL Server
For example, the OldProdGUID column in the data flow is configured with the DT_STR data type, but the table column is configured with the UNIQUEIDENTIFIER type. At the same time, the applicable XML mapping file in this case, SSIS10ToMSSQL.XML, does not include a mapping for the DT_STR SSIS type to the UNIQUEIDENTIFIER SQL Server type. In such cases, SQL Server will usually accept the values and convert them to the UNIQUEIDENTIFIERtype, as long as those values conform to requirements of the target type. If your destination won’t accept the values because of type incompatibility, then you might need to convert or transform your data flow in some other way to conform to the targeted data source.

Working with SSIS Data Types

Whenever you work with data in an SSIS package, you’re working with SSIS data types. It doesn’t matter where the data comes from or where it’s going. When you retrieve or load data, SSIS tries to automatically convert it to the correct types. If SSIS can’t implicitly convert the data—and transforming the data within the package doesn’t work—you might need to modify the XML mapping files, stage the data so it’s compatible with both SSIS and the data source, create a custom component that can retrieve or load the data, or implement a solution outside of SSIS to prepare the data. For the most part, however, you should find that the SSIS types provide you with the versatility you need to retrieve data from a variety of sources and load it into a variety of destinations.

No comments:

Post a Comment