Dr. Alam, CS 538 ETL Assignment
Title: Extraction, Transformation, Loading (ETL) with SQL Server Integration Services (SSIS)
CS 538 Business Intelligence and Data Mining
Fall 2023
Instructions:
1. Create a new SSIS project.
2. Create a new database called ETL_Data in SQL Server.
3. Create an SSIS package for each of the following tasks:
· PersonBio – This package will export the data to the ETL_Data database from the source tables and columns listed below found in the AdventureWorks database:
Source Tables: Person.Person, Person.EmailAddress, Person.StateProvince, Person.PersonPhone, Person.BusinessEntityAddress, and Person.Address.
Source fields: FirstName, LastName, AddressLine1, AddressLine2, City, StateProvince.Name, EmailAddress, and PhoneNumber.
The new table will contain every employees’ information from the tables above regardless of they have a phone number or email address.
Name the new table PersonBio in your ETL_Data database.
Name the Source Assistant PersonBioSource, and the Destination Assistant PersonBioDestination
Name the package PersonBio.
· SplitByStateName – This package will split the data from the PersonBio table into different tables within ETL_Data database based on the first letter of the StateProvince name.
You will create 5 new destination tables (StatesWith A, B, C, Null, and Others).
You may have to place the condition for the states with NULL value first, before specifying other conditions.
This is a sample of the output table that will be generated by the ETL process.
· ProductSalesInfo – This package will calculate the sales amount and sales quarter for each product.
The data source for this package will be a query from the following tables: Production.Product, Production.ProductSubcategory, ProductCategory, Sales.SalesOrderHeader, Sales.SalesOrderDetail.
The query should show the following fields: Production.Product.Name, Production.ProductCategory.Name AS [CategoryName], Production.Product.ListPrice, Sales.SalesOrderHeader.OrderDate (only the orders after 2004), Sales.SalesOrderDetail.OrderQty.
Create two Derived Columns in the destination table. Name the first derived column SalesAmount. You can calculate the sales amount by multiplying ListPrice and OrderQty.
Name the second derived column SalesQtr. The data for this column should be extracted from the OrderDate field using a month function. You will need to build an “IF” statement around the month function that will check and assign the quarter value. The conditions for the IF statements can be, IF month of the date is > 9 then the value of the SalesQtr is 4th qtr, IF the month of the date is > 6 then then SalesQtr is 3rd Qtr, IF the month is > 3 then SalesQtr is 2nd Qtr, and for all the other months SalesQtr is 1st Qtr.
Name the output table ProductSalesInfo.
This is a sample of the output table that will be generated by the ETL process.
· SalesAggregate – This package will aggregate the data from the ProductSalesInfo table to show the total quantity and total sales amount for each product.
The data source for this package will be the ProductSalesInfo table.
Select these fields for the source query: Production.Product.Name, Production.ProductCategory.Name AS CategoryName, Sales.SalesOrderDetail.OrderQty, Sales.SalesOrderDetail.UnitPrice.
Create a derived column called SalesAmount by multiplying Price with Qty. After adding the derived column task, add an Aggregate task in the package. Aggregate the fields in such a way that for each product name total quantity and total sales amount are shown.
Name the output table SalesAggregate.
Using a Multicast task, export the data in a flat file and into a SQLServer table. Name the flat file SalesAggregate.txt and the SQLServer table SalesAggregate.
This is a sample of the output table that will be generated by the ETL process.
4. Save all of the packages in the same project.
5. Truncate the destination tables in each package before running the package.
6. Name the data flow tasks, data source tasks, destination source tasks, and any other task or transformation module meaningfully.
7. Zip the project folder and name the zipped folder with your last and first name.
8. Upload the zipped folder to Canvas.
Grading:
Your assignment will be graded on the following criteria:
· Correctness of the data in the destination tables.
· Use of meaningful names for data flow tasks, data source tasks, destination source tasks, and any other task or transformation module.
· Use of truncate SQL commands in every package to delete any destination table and prevent duplication before running any task in your package.
· Creation of the ETL_Data database in SQL Server.
2
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more