Export Power BI Desktop data to SQL Server

Did you ever faced a scenario were you needed to load a collection of CSV/Text files into SQL Server tables?

What solution did you choose?

  • TSQL BULK INSERT?
  • SSIS Package (generated from SSMS Tasks->Import Data or manual)
  • PowerShell “Import-CSV”

And what if the SQL Server destination tables must be typed (numeric, date, text columns,…) and the CSV file has formatting issues (ex: text columns without quotes, datetimes not in ISO format) and you need to transform the columns into the desired types?

A much quicker solution to transform CSV files into the desired shape is using a PowerBI Desktop query (or PowerQuery), for example in seconds I can:

  • Load the CSV
  • Replace a value from all the columns (in this case “NULL” from a real null)
  • Auto detect the datatypes

PBIDesktopQuery

Now to load these queries into a SQL Server database, it’s very easy thanks to DevScope powershell module “PowerBIETL” (also available at PowerShellGallery):


Install-Module PowerBIETL
Import-Module PowerBIETL

Export-PBIDesktopToSQL -pbiDesktopWindowName "*sample*" -sqlConnStr "Data Source=.\SQL2014; Initial Catalog=DestinationDB; Integrated Security=SSPI" -sqlSchema "stg" -verbose

The cmdlet “Export-PBIDesktopToSQL” will take care of:

  1. Connects to the PBI Desktop and read the tables
  2. Automatically create the tables on the SQL Database (if they do not exist)
    • Thanks to DevScope “SQLHelper” powershell module and “Invoke-SQLBulkCopy” cmdlet
  3. Bulk copy the data from PBI Desktop into the SQL Table

The cmdlet has 4 parameters:

  • -PBIDesktopWindowName (mandatory)
    • A wildcard to find the PowerBI Desktop window
  • -Tables (optional, defaults to all the tables)
    • Array of tables to import
  • -SQLConnStr (mandatory)
    • Connection to a SQL Server database
  • -SQLSchema (optional, defaults to “dbo”)
    • The schema under the tables will be created

As a result all the tables from the PBI Desktop file will get copied into the SQL Server database:

image

Off course this will only work to those “one-time-only” or manual scenarios, but I assure you that is much quicker than using a SQL Integration Services package Winking smile

Advertisements

A quick ETL PowerShell script to copy data between databases

I’ll start this post by announce that we Just moved our SQLHelper powershell module from CodePlex into GitHub, you can grab it here:

https://github.com/DevScope/sql-powershell-modules

We also made a new sample to demonstrate the simplicity of use of this module, the objective is to build a very simple ETL script that make these steps:

  1. Query a collection of tables from a source database (can be any type: SQL/Oracle/…)
  2. Insert into a SQLServer database using bulk loading
    • If the table does not exist  in the destination, it must be created automatically

The powershell code:

Import-Module ".\SQLHelper.psm1" -Force

$sourceConnStr = "sourceconnstr"

$destinationConnStr = "destinationconnstr"

$tables = @("[dbo].[DimProduct]", "[dbo].[FactInternetSales]")

$steps = $tables.Count
$i = 1;

$tables |% {

    $sourceTableName = $_
    $destinationTableName = $sourceTableName

    Write-Progress -activity "Tables Copy" -CurrentOperation "Executing source query over '$sourceTableName'" -PercentComplete (($i / $steps)  * 100) -Verbose

    $sourceTable = (Invoke-DBCommand -connectionString $sourceConnStr -commandText "select * from $sourceTableName").Tables[0]

    Write-Progress -activity "Tables Copy" -CurrentOperation "Creating destination table '$destinationTableName'" -PercentComplete (($i / $steps)  * 100) -Verbose

    Invoke-SQLCreateTable -connectionString $destinationConnStr -table $sourceTable -tableName $destinationTableName -force

    Write-Progress -activity "Tables Copy" -CurrentOperation "Loading destination table '$destinationTableName'" -PercentComplete (($i / $steps)  * 100) -Verbose

    Invoke-SQLBulkCopy -connectionString $destinationConnStr -data $sourceTable -tableName $destinationTableName                

    $i++;
}

Write-Progress -activity "Tables Copy" -Completed

ETLPowerShellScript

Converting a PowerPivot workbook w/ PowerQuery connections into SSAS Tabular

No, SSAS Tabular mode don’t support PowerQuery connections… (yet… But I’am confident that will be in the future)

But I was happy to verify that it’s very easy to change the connection and point into another table in a supported provider (ex: a SQL Server DW).

It’s true that you will loose all the transformations made in powerquery, but in the scenario that we are converting a self-service BI model made in Excel into a more enterprise level BI application where the transformations need to be made by a more “professional” ETL tool like SSIS tool, it’s good to know that it’s possible to convert the connection very easily.

PS – This is not the same history with Linked Tables in PowerPivot, when converting into a tabular database those tables will end up as Pasted Tables which are very hard to update, so stay away from Linked Tables… As an alternative I find preferable to create a powerquery into the Excel table and then load into the model.

Start converting the powerpivot workbook into a SSAS Tabular project:

image

Then go to Model –> Existing Connections, and you’ll find all your PowerQuery connections:

image

Try to process one of them, and you’ll get this error:

image

Now you need for each PowerQuery connection to change the connection provider into a supported one and point into a object that has the same name and schema (if it doesn’t have the same schema/name you will have to change the table partition query).

In this case I have a table named “Employee” with these columns:

image

Create a table in your datawarehouse with the same name and schema:

image

Edit the “PowerQuery – Employees” connection and change the provider:

image

Connect into my SQLServer DW:

image

image

The simply process the table and it will process sucessfuly, and the new data will be loaded:

image

image

A lightweight SQL Helper Powershell Module (@DevScope)

Recently at DevScope we developed a simple but powerfull powershell module to query databases that I want to share because its been of great use to me lately.

What differentiates this module is its simplicity and lighweight, with it you can:

    • Query any .Net Data Provider with only one cmdlet (SQLClient, OracleClient, OLEDB, ODBC,…)
      • For instance, with the same cmdlet you can query an Excel spreadsheet and insert its data into a SQL Server database
    • Easily execute a bulk copy into a SQL Server Database

You can download the module and a samples of usage at GitHub:

https://github.com/DevScope/sql-powershell-modules

Examples of usage:

Query an Excel Spreadsheet


$dataSet = Invoke-DBCommand -providerName "System.Data.OleDb" `
		-connectionString "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='$currentPath\ExcelData.xlsx';Extended Properties=Excel 12.0" `
		-commandText "select * from [Sheet1$]" -verbose

Insert into SQL Server


$dataSet.Tables[0].Rows |% {	
	$numRows = Invoke-DBCommand -providerName "System.Data.SqlClient" `
		-connectionString "Integrated Security=SSPI;Persist Security Info=True;Initial Catalog=Dummy;Data Source=.\sql2012" `
		-executeType "NonQuery" `
		-commandText "insert into dbo.Products values (@id, @name, @datecreated)" `
		-parameters @{"@id"=$_.ProductKey;"@name"=$_.EnglishProductName;"@datecreated"=[datetime]::Now}					
}

Bulk Copy into SQL Server


Invoke-SqlBulkCopy -connectionString "Integrated Security=SSPI;Persist Security Info=True;Initial Catalog=Dummy;Data Source=.\sql2012" `
		-tableName "dbo.Products" `
		-data $dataSet.Tables[0] `
		-columnMappings @{"ProductKey" = "Id"; "EnglishProductName" = "Name"} -verbose

Concatenate a row set into a sql column

Other day I made a post on how to concatenate a rowset into a TSQL variable: here

In this post I’ll show how to make a concat into a column without using any variable:

select (

select Col + ', '

from ( select 'Value 1' Col union select 'Value 2' union select 'Value 3' ) aux FOR XML PATH('')

) ConcatColumn

The trick is on "FOR XML PATH("")" that transform the rows into a xml string where the node name is passed by parameter, in this case I pass an empty string, so the result is:

clip_image001

Drop a SQL Server Schema and all related Objects

I love to organize my SQL Server database objects using schemas and in my current project I need to create a lot of them dynamically so I also need to manage a way to delete them :).

So I implemented a stored procedure that allows me to delete the schema and all it related objects in a database (also specified by parameter).

Usage:

exec dbo.DropSchema @schemaName = ‘schema’, @database = ‘[Database]’, @whatif = 1

You can download the procedure script here.

If you find any bugs or have any sugestion please fell free to comment.

Resources:

Slowly Changing Dimensions (SCD) with MERGE Statement

I was reading the “The Kimball Group Reader Relentlessly Practical Tools for Data Warehousing and Business Intelligence” book, witch I highly recommend, and found a very interesting technique for dealing with Slowly Changing Dimensions of type 2 using the Sql Server 2008 Merge statement:

INSERT INTO Customer_Master
SELECT Source_Cust_ID, First_Name, Last_Name, Eff_Date, End_Date, Current_Flag
FROM (
MERGE Customer_Master CM
USING Customer_Source CS
ON (CM.Source_Cust_ID = CS.Source_Cust_ID)
WHEN NOT MATCHED THEN
INSERT VALUES (CS.Source_Cust_ID, CS.First_Name, CS.Last_Name, convert(char(10), getdate()-1, 101), ‘12/31/2199’, ‘y’)
WHEN MATCHED AND CM.Current_Flag = ‘y’
AND (CM.Last_Name <> CS.Last_Name ) THEN
UPDATE SET CM.Current_Flag = ‘n’, CM.End_date = convert(char(10), getdate()-2, 101)
OUTPUT $Action Action_Out, CS.Source_Cust_ID,
CS.First_Name, CS.Last_Name,convert(char(10),getdate()-1, 101) Eff_Date,‘12/31/2199’ End_Date, ‘y’Current_Flag) AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = ‘UPDATE’;

Explanation:

Lines 1–3 set up a typical  INSERT statement. What we will end up inserting are the new values of the type 2 rows that have changed.

Line 4 is the beginning of the MERGE statement, which ends at line 13. The MERGE statement has an OUTPUT clause that will stream the results of the MERGE out to the calling function. This syntax defines a common table expression, essentially a temporary table in the FROM clause, called MERGE_OUT.

Lines 4–6 instruct the MERGE to load customer source data into the customer master dimension table.

Line 7 says when there is no match on the business key, we must have a new customer, so line 8 does the INSERT. You could parameterize the effective date instead of assuming yesterday’s date.

Line 10 identify a subset of the rows with matching business keys, specifically, where it’s the current row in the customer master and any one of the type 2 columns is different.

Line 12 expires the old current row in the customer master by setting the end date and current row flag to ‘n’.

Line 14 is the OUTPUT clause, which identifies what attributes will be output from the MERGE, if any. This is what will feed into the outer INSERT statement. The $Action is a MERGE function that tells us what part of the merge each row came from. Note that the output can draw from both the source and the master. In this case, we are outputting source attributes because they contain the new type 2 values.

Line 17 limits the output row set to only the rows that were updated in customer master. These correspond to the expired rows in Line 11, but we output the current values from customer source in Line 12.