Learn how to leverage SSIS on premises to interact with cloud data in ADF v2 in this tutorial by Christian Cote, Michelle Gutzait, and Giuseppe Ciaburro, the authors of Hands-On Data Warehousing with Azure Data Factory.
SQL Server Integration Services (SSIS) has been the Microsoft ETL predilection tool for more than a decade. A lot of enterprises have used SSIS to load their on-premise data warehouses since its inception in SQL Server 2005.
In the last couple of years, IT departments have had to deal with different kinds of data and specific toolsets to process them. SSIS has successfully been able to access cloud data from on-premise ETL servers since 2015 with the Azure Feature Pack (https://docs.microsoft.com/en-us/sql/integration-services/azure-feature-pack-for-integration-services-ssis?view=sql-server-2017). However, issues occur when most of the ETL is in the cloud and SSIS is in a small part of the chain. And, up until now, it was very complex to use ADF v1 as the orchestrator in the cloud, with some SSIS package calls in the pipeline.
The following sections will describe how SSIS on-premises can be successfully leveraged to interact with cloud data in ADF v2.
The first thing that you need to do is set up an SSIS solution. You can use Microsoft wide world importers. The samples can be found at https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0.
From the list, you can download the databases and the SSIS solution:
- Daily.ETL.ispac: This is the SSIS solution that takes data from WideWorldImporters to the WideWorldImportersDW database
- WideWorldImporters-Full.bacpac: This is the transactional database that is used as a source for the data warehouse
- WideWorldImportersDW-Full.bacpac: This is the sample data warehouse database that you’ll use in this article
The first step is to set up sample databases. For this, you’ll need to install SQL Server Management Studio (SSMS). This program is freely available from Microsoft at https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms.
Start SSMS to connect to your server and right-click on the Databases folder in Object Explorer. Select Import Data-tier Application… from the contextual menu as shown in the following screenshot:
The Import Data-tier Application wizard will open. Now click on Next.
In the Import Settings step, make sure that the Import from local disk option is selected and click on Browse…. From the Open window, navigate to the location from where you previously downloaded the .bacpac files and click on Open, as shown in the following screenshot:
Click on Next on the Import Settings screen.
You can now proceed to the Database Settings step. Make sure that the entry in the New database name textbox is WideWorldImporters and click on Next.
The Summary will be displayed as shown in the following screenshot. Click on Finish to start the BACPAC import process:
The database is created as well as all the objects in it. This process may take a few minutes. Click on Close to terminate the process:
Repeat the same process for the other BACPAC WideWorldImportersDW-Full.bacpac. Make sure that you specify WideWorldImportersDW as the database name.
The SSIS setup usually requires an existing installation of SQL Server, with the Integration Services feature selected. With ADF v2, this step is less mandatory since you can deploy the packages in an Azure database.
That being said, it’s always better to test your SSIS solution on premise before deploying it in Azure. The next section will describe how to set up a sample SSIS solution.
Integration Services Catalog setup
This step is necessary for brand new SQL Server installations where you may want to deploy SSIS packages. When logged in to the server with SSMS, simply right-click on the Integration Services Catalogs folder and select Create Catalog…
The Create Catalog window will open up. You’ll need to fill in the properties and click on OK to create the integration services catalog:
Next, you need to create a folder for your project. In Object Explorer, expand the Integration Services Catalogs node, right-click on SSISDB, and select Create Folder… from the contextual menu:
The Create Folder window will open up. Set the properties as shown in the following screenshot and click on OK:
You’re now ready to open the solution in Visual Studio. Here, you can deploy it to your newly created on-premises Integration Services Catalogs folder.
Sample solution in Visual Studio
For this section, you’ll need an existing installation of Visual Studio 2015 or 2017 and SQL Server Data Tools (SSDT) installed. Visual Studio is available at https://docs.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt?view=sql-server-2017.
Once SSDT is opened, click on FILE | New | Project…:
The New Project window will open up. Select Integration Services Import Project Wizard, type WWImportersDaily in the textbox beside Name, and click on OK:
The Integration Services Import Project Wizard window will open; click on Next to proceed to the next step:
In the Select Source window, select the Project deployment file radio button and click on Browse…. This will open up the Open file window. Navigate to the folder where .ispac, the SSIS sample file, has been downloaded and click on Open:
Once back to the wizard, click on Next to proceed to the next step.
In the Review window, click on the Import button.
Once the project has been imported, you can see it in Solution Explorer:
You’ll have to modify the two connection managers if your server is not the default instance. To do so, double-click on WWI_DW_Destination_DW.conmgr and WWI_Source_DB.conmgr, and change their server properties.
If the connections are online, you can run the package and see if everything is working fine.
Deploying the project on premises
You’re now ready to deploy and test your project on your local server:
1. From the Solution Explorer, right-click on the WWImportersDaily project and select Deploy. Integration Services Deployment Wizard will open up as shown in the following screenshot. Click on Next to proceed to the next step:
2. Connect to your SQL Server where the integration services catalog was created before and click on Browse…. The Browse for Folder or Project window will open.
3. Expand SSISDB and select the DWDailyRefresh
4. Click on OK to close the window:
5. On the Select Destination window, click on Next.
6. The next step is to validate. However, the on-premise deployments don’t use it.
7. Click on Next to proceed to the Review window. Here, you can review whether you have selected the right .ispac file and confirm that you’re going to deploy it to the right folder in SSIS:
8. Now click on Deploy to start the deployment of the .ispac file.
9. The last step, Results, shows the progress of your deployment. If everything is successful (as it should be), click on the Close button:
You’re now done with the deployment.
10. The next step is to execute the package via the catalog. You can switch to SQL Server Management Studio for this.
11. In the Object Explorer, expand Integration Services Catalogs and navigate through the DailyETLMain.dtsx package.
12. Right-click on it and select Execute… from the contextual menu to open up the execution window:
13. Go to the Connection Managers tab, check the Connection String property for both connectors, and adjust them if necessary. If everything is okay, click on OK to start the package execution:
14. Click on Yes to see the report. Once the report has opened, you might notice that the package is still running. When you refresh it, though, you’ll see that the package has been successfully executed:
You have successfully leveraged SSIS on-premises. If you found this article to be an interesting read, you can explore Hands-On Data Warehousing with Azure Data Factory to learn how to build your own ETL solutions and address the key challenges faced while building them.
P.S. This post was provided by good people at Packt Publishing. Thank you, Ron!