Use Power BI Desktop as an ETL Tool

Did you ever faced a scenario were you needed to load a collection of CSV/Text files into SQL Server tables?

What solution did you choose?

  • TSQL BULK INSERT?
  • SSIS Package (generated from SSMS Tasks->Import Data or manual)
  • PowerShell “Import-CSV”

And what if the SQL Server destination tables must be typed (numeric, date, text columns,…) and the CSV file has formatting issues (ex: text columns without quotes, datetimes not in ISO format) and you need to transform the columns into the desired types?

A much quicker solution to transform CSV files into the desired shape is using a PowerBI Desktop query (or PowerQuery), for example in seconds I can:

  • Load the CSV
  • Replace a value from all the columns (in this case “NULL” from a real null)
  • Auto detect the datatypes

PBIDesktopQuery

Now to load these queries into a SQL Server database, it’s very easy thanks to DevScope powershell module “PowerBIETL” (also available at PowerShellGallery):


Install-Module PowerBIETL
Import-Module PowerBIETL

Export-PBIDesktopToSQL -pbiDesktopWindowName "*sample*" -sqlConnStr "Data Source=.\SQL2014; Initial Catalog=DestinationDB; Integrated Security=SSPI" -sqlSchema "stg" -verbose

The cmdlet “Export-PBIDesktopToSQL” will take care of:

  1. Connects to the PBI Desktop and read the tables
  2. Automatically create the tables on the SQL Database (if they do not exist)
    • Thanks to DevScope “SQLHelper” powershell module and “Invoke-SQLBulkCopy” cmdlet
  3. Bulk copy the data from PBI Desktop into the SQL Table

The cmdlet has 4 parameters:

  • -PBIDesktopWindowName (mandatory)
    • A wildcard to find the PowerBI Desktop window
  • -Tables (optional, defaults to all the tables)
    • Array of tables to import
  • -SQLConnStr (mandatory)
    • Connection to a SQL Server database
  • -SQLSchema (optional, defaults to “dbo”)
    • The schema under the tables will be created

As a result all the tables from the PBI Desktop file will get copied into the SQL Server database:

image

Off course this will only work to those “one-time-only” or manual scenarios, but I assure you that is much quicker than using a SQL Integration Services package Winking smile

Power BI Desktop Trace Logs Analyser

In this post I will show you how to analyse Power BI Desktop diagnostic trace files in a more visual way than notepad Smile

First you need to collect some diagnostics by enabling tracing on Power BI Desktop, go to: File –> Options –> Diagnostics –> Enable Tracingimage

If you click on “Open Traces folder”:

image

It will open the trace folder with all the trace logs:

image

PS – Trace log are only generated after you test your power bi report, do some refresh and interactions first to create the trace logs

Now to analyse these logs you could off course open them in notepad:

image

But is not very easy to read, so what better way to process and visualize this huge amount of text data??? Power BI off course!!!

So I created a Power BI Desktop to process and visualize the trace logs, that will allow you to quickly visualize things like:

  • Errors
  • Duration of queries
  • Performance issues
  • etc

image

image

Instructions of usage:

  • Download and open the Power BI Desktop file
  • “Edit Queries” and change the variable “VAR_LogFolder” to point to the trace logs folder:

image

image

  • Refresh the Report

image

Twitter Real-Time Analytics in PowerBI with #PowerBIPS

In SQLSaturday Paris I made a demo of Real-Time twitter analytics in PowerBI using the PowerShell module PowerBIPS.

In this post I will show how I did that (you can find the full script at the bottom), the script does the following:

  1. Imports the required modules: #PowerBIPS and #InvokeTwitterAPIs
  2. Save in a variable the Twitter OAuth settings (learn mode here)
  3. Starts a infinite loop that will pool the Twitter data and at each iteration:
    1. Calls the Twitter search API that will search for new tweets containing the #PowerBI hashtag and since the last query
      1. The script stores in a local file the “sinceId” and will use it at a next execution
    2. Parses the Twitter search results and for each tweet does some sentiment analysis using the REST APi of http://www.sentiment140.com
    3. Sends that data to PowerBI using the Out-PowerBI cmdlet of #PowerBIPS

At the end you should be able to do a dashboard similar to this:

image


cls

$ErrorActionPreference = "Stop"

$currentPath = (Split-Path $MyInvocation.MyCommand.Definition -Parent)

# Import the PowerBIPS Powershell Module: https://github.com/DevScope/powerbi-powershell-modules
Import-Module "$currentPath\..\PowerBIPS" -Force

# Import the InvokeTwitterAPIs module: https://github.com/MeshkDevs/InvokeTwitterAPIs
Import-Module "$currentPath\Modules\InvokeTwitterAPIs.psm1" -Force

#region Twitter Settings

# Learn how to generate these keys at: https://dev.twitter.com/oauth and https://apps.twitter.com
$accessToken = "your access token key"
$accessTokenSecret = "your access token secret"
$apiKey = "your api key"
$apiSecret = "your api secret"

$twitterOAuth = @{'ApiKey' = $apiKey; 'ApiSecret' = $apiSecret; 'AccessToken' = $accessToken; 'AccessTokenSecret' = $accessTokenSecret}

#endregion

$sinceId = $null
$sinceIdFilePath = "$currentPath\twitterDemoSinceId.txt"

while($true)
{	
	if (Test-Path $sinceIdFilePath)
	{
		$sinceId = Get-Content $sinceIdFilePath
	}	

	# Hashtags to search (separated by comma) and the number of tweets to return, more examples of search options: https://dev.twitter.com/rest/public/search

	$twitterAPIParams = @{'q'='#powerbi';'count' = '5'}

	if (-not [string]::IsNullOrEmpty($sinceId))
	{
		$twitterAPIParams.Add("since_id", $sinceId)
	}

	# Ger Twitter Data (if SinceId is not Null it will get tweets since that one)

	$result = Invoke-TwitterRestMethod -ResourceURL 'https://api.twitter.com/1.1/search/tweets.json' -RestVerb 'GET' -Parameters $twitterAPIParams -OAuthSettings $twitterOAuth -Verbose

	# Parse the Twitter API data

	$twitterData = $result.statuses |? { [string]::IsNullOrEmpty($sinceId) -or $sinceId -ne $_.id_str } |% {
	
		$aux = @{
			Id = $_.id_str
			; UserId = $_.user.id
			; UserName = $_.user.name
			; UserScreenName = $_.user.screen_name
			; UserLocation = $_.user.location
			; Text = $_.text
			; CreatedAt =  [System.DateTime]::ParseExact($_.created_at, "ddd MMM dd HH:mm:ss zzz yyyy", [System.Globalization.CultureInfo]::InvariantCulture)		
		}

		# Get the Sentiment Score

		$textEncoded = [System.Web.HttpUtility]::UrlEncode($aux.Text, [System.Text.Encoding]::UTF8)

		$sentimentResult = Invoke-RestMethod -Uri "http://www.sentiment140.com/api/classify?text=$textEncoded" -Method Get -Verbose

		switch($sentimentResult.results[0].polarity)
		{
			"0" { $aux.Add("Sentiment", "Negative") }
			"4" { $aux.Add("Sentiment", "Positive") }
			default { $aux.Add("Sentiment", "Neutral") }
		}
		
		Write-Output $aux
	}

	if ($twitterData -and $twitterData.Count -ne 0)
	{
		# Persist the SinceId

		$sinceId = ($twitterData | Sort-Object "CreatedAt" -Descending | Select -First 1).Id
		
		Set-Content -Path $sinceIdFilePath -Value $sinceId

		# Send the data to PowerBI

		$twitterData | Out-PowerBI -dataSetName "TwitterPBIAnalysis" -tableName "Tweets" -types @{"Tweets.CreatedAt"="datetime"} -verbose
	}
	else
	{
		Write-Output "No tweets found."
	}

	Write-Output "Sleeping..."

	Sleep -Seconds 30
}

Using #PowerBIPS to Upload CSV Files to PowerBI

In June I made a guest post on the PowerBI Developer Blog about #PowerBIPS that showed how to upload a collection of local csv file into PowerBI, check it out here:

http://blogs.msdn.com/b/powerbidev/archive/2015/06/08/using-a-power-bi-app-to-upload-csv-files-to-a-dataset.aspx

PS – Thanks Josh😉

1447.dashboard before4682.explore.png-550x0


cls

$ErrorActionPreference = "Stop"

$currentPath = (Split-Path $MyInvocation.MyCommand.Definition -Parent)

 
#Create Archive Folder 

new-item -Name "Archive" -Force -ItemType directory -Path "$currentPath\CSVData"  | Out-Null

Import-Module "$currentPath\Modules\PowerBIPS" -Force

while($true)

{

	# Iterate each CSV file and send to PowerBI

	Get-ChildItem "$currentPath\CSVData" -Filter "*.csv" |% { 

		$file=$_               

		#Import csv and add column with filename

		$data = Import-Csv $file.FullName | select @{Label="File";Expression={$file.Name}}, *

		# Send data to PowerBI

		$data |  Out-PowerBI -dataSetName "CSVSales" -tableName "Sales" -types @{"Sales.OrderDate"="datetime"; "Sales.SalesAmount"="double"; "Sales.Freight"="double"} -batchSize 300 -verbose

		# Archive the file

		Move-Item $file.FullName "$currentPath\CSVData\Archive\" -Force

	}   

	Write-Output "Sleeping..."

	Sleep -Seconds 5

}

Create a Real-Time IT Dashboard with PowerBIPS

Last week we published on GitHub a powershell module for the new PowerBI developer REST API’s named “PowerBIPS”, download it here:

https://github.com/DevScope/powerbi-powershell-modules/

In this post I will demonstrate the ease of use of this module and show you step-by-step how to build a powershell script that upload server stats to PowerBI allowing the users to build a real-time IT dashboard like this:

image

The script is divided in the following steps (full script):

  1. Import the PowerBIPS module
  2. Get the Authentication Token
  3. Create a PowerBI DataSet
  4. Upload data to PowerBI

Import the PowerBIPS module

The script starts by importing the powershell module:

# Module need to be installed on %USERPROFILE\WindowsPowershell\Modules\
Import-Module PowerBIPS -Force

Or by pointing directly to the .psm1 file:

Import-Module <path to module>\PowerBIPS.psm1 –Force

After this all the PowerBIPS cmdlets are available for use in the PowerShell console.

Get the Authentication Token

Next we need to get the authentication token that is needed to communicate with the API, this token identifies your PowerBI tenant and grants access to it:

$authToken = Get-PBIAuthToken -clientId "<your client id>"

This cmdlet require you to provide the Client Id of the Native Application that you need to create in the windows azure active directory of yout tenant:

PS -Follow this guide to create an Azure Client App (thanks to mattmcnabb)

image

When you execute this cmdlet the following popup is shown asking for the credentials to connect to the PowerBI account:

image

It’s also possible to set in the cmdlet the username and password which is the ideal mode for automation jobs.

Create a PowerBI DataSet

Before creating the DataSet we will test if a DataSet with name “IT Server Monitor” exists in PowerBI, if not create a new one:


$dataSetMetadata = Get-PBIDataSet -authToken $authToken -dataSetName "IT Server Monitor"

if (-not $dataSetMetadata)
{
  # If cannot find the DataSet create a new one with this schema	
	
  $dataSetSchema = @{
	name = "IT Server Monitor"
	; tables = @(
		@{name = "Processes"
		; columns = @( 
			@{ name = "ComputerName"; dataType = "String"  }					
			, @{ name = "Date"; dataType = "DateTime"  }
			, @{ name = "Hour"; dataType = "String"  }
			, @{ name = "Id"; dataType = "String"  }
			, @{ name = "ProcessName"; dataType = "String"  }
			, @{ name = "CPU"; dataType = "Double"  }
			, @{ name = "Memory"; dataType = "Double"  }
			, @{ name = "Threads"; dataType = "Int64"  }					
			) 
		})
  }	

  $dataSetMetadata = New-PBIDataSet -authToken $authToken -dataSet $dataSetSchema -Verbose
}

The $dataSetSchema variable is a hierarchical hashtable that define the schema of the dataset (a System.Data.DataSet is also supported)

The “New-PBIDataSet” cmdlet in case of success returns the PowerBI dataset metadata with the internal id of the dataset.

Upload data to PowerBI

In this script we will start by deleting all the data in the “Processes” table (optional)

Clear-PBITableRows -authToken $authToken -dataSetId $dataSetMetadata.Id -tableName "Processes"

And each 10 seconds cycle the servers and for each one:

  1. Get all the running processes and related stats: CPU, Memory,…
  2. Upload the collected data to PowerBI
$computers = @("DSRomano")

while($true)
{	
    $computers |% {
		
	$computerName = $_
		
	$timeStamp = [datetime]::Now
		
	# Dates sent as string because of an issue with ConvertTo-Json (alternative is to convert each object to a hashtable)
		
	Get-Process -ComputerName $computerName | Select  @{Name = "ComputerName"; Expression = {$computerName}} `
		, @{Name="Date"; Expression = {$timeStamp.Date.ToString("yyyy-MM-dd")}} `
		, @{Name="Hour"; Expression = {$timeStamp.ToString("HH:mm:ss")}} `
		, Id, ProcessName, CPU, @{Name='Memory';Expression={($_.WorkingSet/1MB)}}, @{Name='Threads';Expression={($_.Threads.Count)}} `
	| Add-PBITableRows -authToken $authToken -dataSetId $dataSetMetadata.Id -tableName "Processes" -batchSize -1 -Verbose
		
     }
	
     sleep -Seconds 10
}

After you run the script the following DataSet should appear on your PowerBI account:

image  image

And you can start searching, pining, slicing & dicing with this real time data:

image

image

image

The full PowerShell script can be downloaded here.

A quick ETL PowerShell script to copy data between databases

I’ll start this post by announce that we Just moved our SQLHelper powershell module from CodePlex into GitHub, you can grab it here:

https://github.com/DevScope/sql-powershell-modules

We also made a new sample to demonstrate the simplicity of use of this module, the objective is to build a very simple ETL script that make these steps:

  1. Query a collection of tables from a source database (can be any type: SQL/Oracle/…)
  2. Insert into a SQLServer database using bulk loading
    • If the table does not exist  in the destination, it must be created automatically

The powershell code:

Import-Module ".\SQLHelper.psm1" -Force

$sourceConnStr = "sourceconnstr"

$destinationConnStr = "destinationconnstr"

$tables = @("[dbo].[DimProduct]", "[dbo].[FactInternetSales]")

$steps = $tables.Count
$i = 1;

$tables |% {

    $sourceTableName = $_
    $destinationTableName = $sourceTableName

    Write-Progress -activity "Tables Copy" -CurrentOperation "Executing source query over '$sourceTableName'" -PercentComplete (($i / $steps)  * 100) -Verbose

    $sourceTable = (Invoke-DBCommand -connectionString $sourceConnStr -commandText "select * from $sourceTableName").Tables[0]

    Write-Progress -activity "Tables Copy" -CurrentOperation "Creating destination table '$destinationTableName'" -PercentComplete (($i / $steps)  * 100) -Verbose

    Invoke-SQLCreateTable -connectionString $destinationConnStr -table $sourceTable -tableName $destinationTableName -force

    Write-Progress -activity "Tables Copy" -CurrentOperation "Loading destination table '$destinationTableName'" -PercentComplete (($i / $steps)  * 100) -Verbose

    Invoke-SQLBulkCopy -connectionString $destinationConnStr -data $sourceTable -tableName $destinationTableName                

    $i++;
}

Write-Progress -activity "Tables Copy" -Completed

ETLPowerShellScript

Converting a PowerPivot workbook w/ PowerQuery connections into SSAS Tabular

No, SSAS Tabular mode don’t support PowerQuery connections… (yet… But I’am confident that will be in the future)

But I was happy to verify that it’s very easy to change the connection and point into another table in a supported provider (ex: a SQL Server DW).

It’s true that you will loose all the transformations made in powerquery, but in the scenario that we are converting a self-service BI model made in Excel into a more enterprise level BI application where the transformations need to be made by a more “professional” ETL tool like SSIS tool, it’s good to know that it’s possible to convert the connection very easily.

PS – This is not the same history with Linked Tables in PowerPivot, when converting into a tabular database those tables will end up as Pasted Tables which are very hard to update, so stay away from Linked Tables… As an alternative I find preferable to create a powerquery into the Excel table and then load into the model.

Start converting the powerpivot workbook into a SSAS Tabular project:

image

Then go to Model –> Existing Connections, and you’ll find all your PowerQuery connections:

image

Try to process one of them, and you’ll get this error:

image

Now you need for each PowerQuery connection to change the connection provider into a supported one and point into a object that has the same name and schema (if it doesn’t have the same schema/name you will have to change the table partition query).

In this case I have a table named “Employee” with these columns:

image

Create a table in your datawarehouse with the same name and schema:

image

Edit the “PowerQuery – Employees” connection and change the provider:

image

Connect into my SQLServer DW:

image

image

The simply process the table and it will process sucessfuly, and the new data will be loaded:

image

image