Slowly Changing Dimensions (SCD) with MERGE Statement

I was reading the “The Kimball Group Reader Relentlessly Practical Tools for Data Warehousing and Business Intelligence” book, witch I highly recommend, and found a very interesting technique for dealing with Slowly Changing Dimensions of type 2 using the Sql Server 2008 Merge statement:

INSERT INTO Customer_Master
SELECT Source_Cust_ID, First_Name, Last_Name, Eff_Date, End_Date, Current_Flag
FROM (
MERGE Customer_Master CM
USING Customer_Source CS
ON (CM.Source_Cust_ID = CS.Source_Cust_ID)
WHEN NOT MATCHED THEN
INSERT VALUES (CS.Source_Cust_ID, CS.First_Name, CS.Last_Name, convert(char(10), getdate()-1, 101), ‘12/31/2199’, ‘y’)
WHEN MATCHED AND CM.Current_Flag = ‘y’
AND (CM.Last_Name <> CS.Last_Name ) THEN
UPDATE SET CM.Current_Flag = ‘n’, CM.End_date = convert(char(10), getdate()-2, 101)
OUTPUT $Action Action_Out, CS.Source_Cust_ID,
CS.First_Name, CS.Last_Name,convert(char(10),getdate()-1, 101) Eff_Date,‘12/31/2199’ End_Date, ‘y’Current_Flag) AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = ‘UPDATE’;

Explanation:

Lines 1–3 set up a typical  INSERT statement. What we will end up inserting are the new values of the type 2 rows that have changed.

Line 4 is the beginning of the MERGE statement, which ends at line 13. The MERGE statement has an OUTPUT clause that will stream the results of the MERGE out to the calling function. This syntax defines a common table expression, essentially a temporary table in the FROM clause, called MERGE_OUT.

Lines 4–6 instruct the MERGE to load customer source data into the customer master dimension table.

Line 7 says when there is no match on the business key, we must have a new customer, so line 8 does the INSERT. You could parameterize the effective date instead of assuming yesterday’s date.

Line 10 identify a subset of the rows with matching business keys, specifically, where it’s the current row in the customer master and any one of the type 2 columns is different.

Line 12 expires the old current row in the customer master by setting the end date and current row flag to ‘n’.

Line 14 is the OUTPUT clause, which identifies what attributes will be output from the MERGE, if any. This is what will feed into the outer INSERT statement. The $Action is a MERGE function that tells us what part of the merge each row came from. Note that the output can draw from both the source and the master. In this case, we are outputting source attributes because they contain the new type 2 values.

Line 17 limits the output row set to only the rows that were updated in customer master. These correspond to the expired rows in Line 11, but we output the current values from customer source in Line 12.

Advertisements