Use Power BI Desktop as an ETL Tool

Did you ever faced a scenario were you needed to load a collection of CSV/Text files into SQL Server tables?

What solution did you choose?

  • TSQL BULK INSERT?
  • SSIS Package (generated from SSMS Tasks->Import Data or manual)
  • PowerShell “Import-CSV”

And what if the SQL Server destination tables must be typed (numeric, date, text columns,…) and the CSV file has formatting issues (ex: text columns without quotes, datetimes not in ISO format) and you need to transform the columns into the desired types?

A much quicker solution to transform CSV files into the desired shape is using a PowerBI Desktop query (or PowerQuery), for example in seconds I can:

  • Load the CSV
  • Replace a value from all the columns (in this case “NULL” from a real null)
  • Auto detect the datatypes

PBIDesktopQuery

Now to load these queries into a SQL Server database, it’s very easy thanks to DevScope powershell module “PowerBIETL” (also available at PowerShellGallery):


Install-Module PowerBIETL
Import-Module PowerBIETL

Export-PBIDesktopToSQL -pbiDesktopWindowName "*sample*" -sqlConnStr "Data Source=.\SQL2014; Initial Catalog=DestinationDB; Integrated Security=SSPI" -sqlSchema "stg" -verbose

The cmdlet “Export-PBIDesktopToSQL” will take care of:

  1. Connects to the PBI Desktop and read the tables
  2. Automatically create the tables on the SQL Database (if they do not exist)
    • Thanks to DevScope “SQLHelper” powershell module and “Invoke-SQLBulkCopy” cmdlet
  3. Bulk copy the data from PBI Desktop into the SQL Table

The cmdlet has 4 parameters:

  • -PBIDesktopWindowName (mandatory)
    • A wildcard to find the PowerBI Desktop window
  • -Tables (optional, defaults to all the tables)
    • Array of tables to import
  • -SQLConnStr (mandatory)
    • Connection to a SQL Server database
  • -SQLSchema (optional, defaults to “dbo”)
    • The schema under the tables will be created

As a result all the tables from the PBI Desktop file will get copied into the SQL Server database:

image

Off course this will only work to those “one-time-only” or manual scenarios, but I assure you that is much quicker than using a SQL Integration Services package Winking smile

Power BI Desktop Trace Logs Analyser

In this post I will show you how to analyse Power BI Desktop diagnostic trace files in a more visual way than notepad Smile

First you need to collect some diagnostics by enabling tracing on Power BI Desktop, go to: File –> Options –> Diagnostics –> Enable Tracingimage

If you click on “Open Traces folder”:

image

It will open the trace folder with all the trace logs:

image

PS – Trace log are only generated after you test your power bi report, do some refresh and interactions first to create the trace logs

Now to analyse these logs you could off course open them in notepad:

image

But is not very easy to read, so what better way to process and visualize this huge amount of text data??? Power BI off course!!!

So I created a Power BI Desktop to process and visualize the trace logs, that will allow you to quickly visualize things like:

  • Errors
  • Duration of queries
  • Performance issues
  • etc

image

image

Instructions of usage:

  • Download and open the Power BI Desktop file
  • “Edit Queries” and change the variable “VAR_LogFolder” to point to the trace logs folder:

image

image

  • Refresh the Report

image

Converting a PowerPivot workbook w/ PowerQuery connections into SSAS Tabular

No, SSAS Tabular mode don’t support PowerQuery connections… (yet… But I’am confident that will be in the future)

But I was happy to verify that it’s very easy to change the connection and point into another table in a supported provider (ex: a SQL Server DW).

It’s true that you will loose all the transformations made in powerquery, but in the scenario that we are converting a self-service BI model made in Excel into a more enterprise level BI application where the transformations need to be made by a more “professional” ETL tool like SSIS tool, it’s good to know that it’s possible to convert the connection very easily.

PS – This is not the same history with Linked Tables in PowerPivot, when converting into a tabular database those tables will end up as Pasted Tables which are very hard to update, so stay away from Linked Tables… As an alternative I find preferable to create a powerquery into the Excel table and then load into the model.

Start converting the powerpivot workbook into a SSAS Tabular project:

image

Then go to Model –> Existing Connections, and you’ll find all your PowerQuery connections:

image

Try to process one of them, and you’ll get this error:

image

Now you need for each PowerQuery connection to change the connection provider into a supported one and point into a object that has the same name and schema (if it doesn’t have the same schema/name you will have to change the table partition query).

In this case I have a table named “Employee” with these columns:

image

Create a table in your datawarehouse with the same name and schema:

image

Edit the “PowerQuery – Employees” connection and change the provider:

image

Connect into my SQLServer DW:

image

image

The simply process the table and it will process sucessfuly, and the new data will be loaded:

image

image

My favorite default Query Load option in PowerQuery

By default PowerQuery loads the built/search query in a blank worksheet, but this can be a little bit time consuming when you have many queries or the query has a lot of rows.

So my recommendation is to by default disable the load of the query in the Options:

image

This way when you create/reference a query in PowerQuery it’s saved in the workbook but it’s not loaded into the worksheet nor the data model:

image

When you are finished with the queries you can choose where to load them, right-clicking the query and choosing the  “Load To…” option:

image

Another PowerQuery DateDimension query

This is nothing new and as already been blogued by Chris Webb and Matt Masson:

https://cwebbbi.wordpress.com/2013/11/19/generating-a-date-dimension-table-in-power-query/

http://www.mattmasson.com/2014/02/creating-a-date-dimension-with-a-power-query-script/

But based on their work a made a new one with some new quickwins:

  • Suport for the day of the week start (in Portugal tipicaly starts on monday not sunday)
  • Only StartDate is mandatory, in this case a Date Dimension is generated til the end of the year
  • Some changes in the columns values

image

image


let CreateDateTable = (StartDate as date, optional EndDate as nullable date, optional Culture as nullable text, optional FirstDayOfWeek as nullable number) as table =>  
 let    
    Culture = if (Culture = null) then "pt-PT" else Culture,
    FirstDayOfWeek = if (FirstDayOfWeek = null) then 1 else FirstDayOfWeek, 
    EndDate  = if (EndDate = null) then Date.EndOfYear(StartDate) else EndDate,
    DayCount = Duration.Days(Duration.From(EndDate - StartDate)) + 1,
    Source = List.Dates(StartDate,DayCount,#duration(1,0,0,0)),
    TableFromList = Table.FromList(Source, Splitter.SplitByNothing()),    
    ChangedType = Table.TransformColumnTypes(TableFromList,{{"Column1", type date}}),
    RenamedColumns = Table.RenameColumns(ChangedType,{{"Column1", "Date"}}),
    InsertYear = Table.AddColumn(RenamedColumns, "Year", each Date.Year([Date])),
    InsertQuarter = Table.AddColumn(InsertYear, "Quarter", each Date.QuarterOfYear([Date])),
    InsertMonth = Table.AddColumn(InsertQuarter, "Month", each Date.Month([Date])),
    InsertDay = Table.AddColumn(InsertMonth, "Day", each Date.Day([Date])),    
    InsertMonthName = Table.AddColumn(InsertDay, "MonthName", each Date.ToText([Date], "MMMM", Culture), type text),
    InsertCalendarMonth = Table.AddColumn(InsertMonthName, "MonthOfYear", each (try(Text.Range([MonthName],0,3)) otherwise [MonthName]) & " " & Number.ToText([Year])),
    InsertCalendarQtr = Table.AddColumn(InsertCalendarMonth, "QuarterOfYear", each "Q" & Number.ToText([Quarter]) & " " & Number.ToText([Year])),    
    InsertWeek = Table.AddColumn(InsertCalendarQtr, "Week", each Date.WeekOfYear([Date])),
    InsertCalendarWeek = Table.AddColumn(InsertWeek, "WeekOfYear", each "W" & Number.ToText([Week]) & " " & Number.ToText([Year])),
    InsertDayWeek = Table.AddColumn(InsertCalendarWeek , "WeekDay", each Date.DayOfWeek([Date], FirstDayOfWeek ) + 1),
    InsertDayName = Table.AddColumn(InsertDayWeek, "WeekDayName", each Date.ToText([Date], "dddd", Culture), type text),
    InsertDayInt = Table.AddColumn(InsertDayName, "DateId", each [Year] * 10000 + [Month] * 100 + [Day]),
    #"Changed Type" = Table.TransformColumnTypes(InsertDayInt ,{{"Year", Int64.Type}, {"Quarter", Int64.Type}, {"Month", Int64.Type}, {"Day", Int64.Type}, {"DateId", Int64.Type}, {"Week", Int64.Type}, {"WeekDay", Int64.Type}}),
    #"Capitalized Each Word" = Table.TransformColumns(#"Changed Type",{{"MonthName", Text.Proper}, {"MonthOfYear", Text.Proper}, {"WeekDayName", Text.Proper}})

  in
     #"Capitalized Each Word"
in
    CreateDateTable

Session Contents Excel & PowerBI @ IT Pro Airlift (2015-01-22)

Thanks to all who attended, hope you had a good time attending as I had presenting.

As promissed here are the link to download the session contents here: http://1drv.ms/1Cm0BQ9

And see the Demos Video here: http://1drv.ms/18azgoV

image

image

image

image