Learn Power BI and Build Your Own Coronavirus Report – Update including Doubling Times

Update to the Blog PostLearn Power BI and Build Your Own Coronavirus Report

Dataset Updated

The Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE), supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL), has updated the dataset files and they have a new names and the US data has new files that break down by state. The US as a whole is still in the global datasets.

The good news is you can follow the original blog post but just use the new files for the global data and/or the US data.

JHU CSSE GitHub Repo/Folder with COVID-19 Data in Time Series Format – https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

In the GitHub folder you will find three CSV (Comma Separated Values) files (see image below); one for Confirmed Cases, one for Recovered Cases and one for Deaths for Global and new files for US.

URLs for the 3 Updated files for Global Dataset:

Confirmed Cases – https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
Recovered Cases –
Deaths –

Dates for Non-US Users

The dates in the file are stored in US format. For users outside of the US please follow Jeff Meddemmen comment from my original post:

“For countries who use a different date format that the source data (i.e in UK we use DD-MM-YYYY) during the Power Query steps when we are changing the Change Type of the columns if you change it just to ‘Date’ as directed then it results in errors. You need to first select Change Type > Locale and then select ‘Date’ and ‘English (United States)’. Then do this again to convert to another format. i.e. again select Change Type > Locale, select ‘Date’ and ‘English (United Kingdom)’ when it prompts if you want to replace the existing Change or add a new Step, choose to Add a new step. You need to repeat that for all the tables.”

New Power BI File with more DAX including Doubling Times!

I have a more detailed Power BI file (it produced the image above) with DAX for normalizing the dates to first confirmed case by Country and normalizing dates by date first 100 cases.S

You can find the detailed Power BI file in my GitHub here:



I hoped you enjoyed learning how to user Power BI and how to build your own report for the Novel Coronavirus (COVID-19). If you want to learn more about Power BI check out the official Power BI documentation – https://docs.microsoft.com/en-us/power-bi/. Also, follow Adam and Patrick on the Guy in a Cube YouTube Channel – https://www.youtube.com/channel/UCFp1vaKzpfvoGai0vE5VJ0w. It is the ultimate source of Power BI knowledge.

I have posted a Power BI file that should look exactly like the one you just built if you followed my instructions! You can follow me on Twitter @STATCowboy, connect up with me on LinkedIn at https://www.linkedin.com/in/jameyj/.

Finally, please be safe during this global pandemic. Wash your hands. Keep safe social distance and limit your interactions with groups until the curve is flat! (You can see in the report you just built that it isn’t flat for the rest of the world!

Guidance from CDC – https://www.cdc.gov/coronavirus/2019-ncov/index.html


GitHub Repo for Power BI Coronavirus Report Tutorial https://github.com/STATCowboy/Power-BI-Coronavirus-Report-Tutorial

Learn Power BI and Build Your Own Coronavirus Report



The Novel Coronavirus (COVID-19) emerged in the city of Wuhan, China. It has been declared a global pandemic by the World Health Organization (WHO). There has been many charts, infographics, dashboards and reports on the web showing the numbers of confirmed and recovered cases as well as deaths around the world.

I thought, “How could I build my own report/dashboard in Power BI”?

In this blog post I will show you how to use Power BI and build a COVID-19 report in Power BI!



Well, that would first require me to find the dataset. Luckily the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE), supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL), has a GitHub Repository with curated daily datasets by Country/Region and Province/State for confirmed cases, recovered cases and deaths. The dataset is updated daily with the latest data.

JHU CSSE GitHub Repo/Folder with COVID-19 Data in Time Series Format – https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

In the GitHub folder you will find three CSV (Comma Separated Values) files (see image below); one for Confirmed Cases, one for Recovered Cases and one for Deaths.

If you click on one of the files you will see the data. Also, you will see a button called “Raw”. If you click the “Raw” button you will get the URL you will need to import that dataset into Power BI.

URLs for the 3 files:

Confirmed Cases – https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv
Recovered Cases –
Deaths –

Each file has the data reported with the Province/State, Country/Region, Lat and Long (Province/State) and pivoted column with cumulative/running totals (notice the values increase by day) of cases/deaths for each day since reporting has started. We will import these three files into Power BI and wrangle the data into a format useful for reporting in Power BI.


Power BI Desktop

Now that we know where to get the data, we need to get the right tool to report on the data, Power BI! You can go to https://powerbi.microsoft.com/en-us/ and click the “Start free >” button to get Power BI Desktop (it’s free!) and start building amazing reports.

Then click “Download free >” on the next page to launch Microsoft Store to install the Power BI Desktop application

When the prompt appears notifying about opening the Microsoft Store click the “Open” button.

In the Microsoft Store click “Install” to install on the Power BI Desktop screen to install Power BI Desktop.


Power BI – Get and Transform Data with Power Query


After Power BI installs in the Start Menu find “Power BI Desktop” launch Power BI. When prompted sign-up for a free account.

After you sign-up on the splash screen click “Get Data”. We will start with import the Confirmed Cases file into Power BI.

NOTE: If the splash screen is not longer there and you are in the Power BI Desktop App

OR …

In the Get Data dialog box type “web” in the search box at the top left, select “Web” and then click “Connect”.

In the “From Web” dialog paste in the URL (https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv) for the Confirmed Cases into the URL box and click “OK”.

In the next screen click “Transform Data” to open Power Query so we can wrangle some data!

The next screen to pop-up is the Power Query Editor. Yes, like the one in Excel (a little better but mostly the same)! Here we can use the language M to transform and load our data into Power BI Desktop. We will do some steps to get the data in the format we want to report in Power BI.

If you look to the far right, you will see a section called “Applied Steps”. You can see Power BI has done two steps, “Source” and “Changed Type”. We are going to remove the “Changed Type” step it automatically added on the import of the CSV file. Click the “Red X” next to “Changed Type” to remove this step.

Now, we will promote the first row of the CSV file to be our column names. If you look at the first row of the data in Power Query you will see the column names. To promote the first row to be Headers of the table click the “Transform” menu, then click the “Use First Row as Headers” button in the ribbon. This will add two steps in the “Applied Steps” section: Promoted Headers and Changed Type.

When we promoted the headers Power Query automatically added a step to detect the data types. Click the “Red X” next to Changed Type to remove this step. We will do this later after we unpivot the data.

Should look like below now.

Now we are going to edit the raw M code to remove an option that Power Query put in the Source step (first one) so that as data is added (new columns) each day to the dataset we will pick up the new columns. To do this, in the “Home” menu click the “Advanced Editor” button.

In the “Advanced Editor” window scroll to the right and in the “Source = Csv.Document …” line remove the “Columns=XX,” part in the line (XX will be a number representing the number of columns in the file, in my example below it was 60) and click “Done”. We want to remove the hard-coded column count because as new data points are added as columns to the file each day, we will want to pick up those columns. If we leave the hard-coded “Columns=XX” we will not pick up these new days/columns.

Should look like this below between “Delimiter” and “Encoding” after you remove the section (be sure to get the comma). When you click “Done” you will not notice any changes.

Let’s rename the table to Confirmed. To do this, in the Name box to the right change time_series_19-covid-Confirmed to Confirmed.


Next, we need to duplicate the table before we perform other steps so that we can create a dimension table to use later for better data modeling and report building in Power BI. To do this, right-click on the Confirmed query on the left and choose Duplicate.

Right click on the new Query it created called Confirmed (2) and choose Rename and name the query Header.

Should look like this.

Let’ cleanup our Header query to be our dimension.

Hold down the shift key and select the four columns in Header called: Province/State, Country/Region, Lat and Long. You should see the column headers for the four columns become yellow. Right-click on one of the four columns and choose Remove Other Columns. With the same four columns selected, right-click again and choose Remove Dupliates.

We are good with the Header table/query now.


Let’s go back to editing our Confirmed query/table. We can now remove the Lat and Long columns of our Confirmed table as we will use them from the new Header table we created. To do this, hold down the shift key and select the Lat and Long columns in the Confirmed table and right-click and choose Remove Columns.

Our Applied Steps should now look like this.

Now comes the fun part, the part that Power Query in Power BI is amazing at doing – Un-pivoting data! We are going to un-pivot on the Province/State and Country/Region columns to get the daily data columns to go from being a column per day to a row per day with a column for the date and a column for the value. To do this we hold down the shift key and select the Province/State and Country/Region columns and then right-click on one of the two columns, Province/State or Country/Region, you selected and choose Unpivot Other Columns.

Now look at our Confirmed table and steps! We now have the data in a better format for reporting … but we have a few more things to do.

So, let’s rename our new Attribute column Date. Right-click on Attribute column and click Rename… and rename the column Date.

Looks like this now.

Now finally let’s change the datatypes of the Date and Value columns to Date and Whole Number, respectively. Right-click on the Date column and click Change Type ->
Date. Right-click on the Value column and click Change Type -> Whole Number.

Our table should look like now!


Repeat these steps in Power Query (MINUS the steps to create the Header table!!) for the other two files – Recovered Cases and Deaths. Call the tables/queries Recovered and Deaths and be sure to use the correct URLs for each table.

To bring in the two new files you can click New Source -> Web at the top and follow the steps above (MINUS the steps to create the Header table!!).


Once done it should look like this!

Now let’s Close & Apply our work and get into the Power BI Desktop. Click the Close & Apply button at the top left to close Power Query and return to the Power BI Desktop.

You will see it bringing the tables into Power BI Desktop.


Click the Save button at the top left and save your work so far! You can call the file Power BI Coronavirus Report.

NOTE: Name your Power BI Report files carefully, i.e., give them names that our friendly as they become the Report name in the Power BI Service when you publish.


Power BI Desktop – DAX Data Wrangling

We now have our data in Power BI Desktop to build report but before we build a report we need to create a few more tables using DAX. DAX is the Data Analysis Expressions language used in Power BI to create calculated columns, measures and tables.

We need to add a few columns as keys in our table, hide a few columns from the reports and add a date table. Click the Data button at the top left side to start using DAX to transform data using DAX in Power BI Desktop. Notice our four tables we created in Power Query on the right.

… … … … … … … … …

We are going to add a column called CountryState that will be combination of the Province/State and Country/Region columns. This will be a key for our tables to the Header table. To add the column/key right-click on the table and click New column.

In the dialog at top enter the formulas below for each table, respectively, to create the key.

Should look like this for the Confirmed table. Click the Checkmark to apply and create the column.

Repeat for each table using the following DAX formulas:


Confirmed: CountryState = COMBINEVALUES("|", 'Confirmed'[Country/Region], 'Confirmed'[Province/State])



Deaths: CountryState = COMBINEVALUES("|", 'Deaths'[Country/Region], 'Deaths'[Province/State])



Header: CountryState = COMBINEVALUES("|", 'Header'[Country/Region], 'Header'[Province/State])



Recovered: CountryState = COMBINEVALUES("|", 'Recovered'[Country/Region], 'Recovered'[Province/State])


We can now hide these four keys from each table as we do not need them in the reports themselves but just to create the relationships between the 3 data tables and the header table.

In each table right-click on the CountryState column we just created and choose Hide in report view. Repeat for each table.

You will now see that the CountryState column is greyed-out now.


Repeat (Hide) for the Date columns in the 3 data table, Confirmed, Deaths and Recovered.


Now we will use DAX to create a date/calendar table to use as our Date dimension table. To do this click the Table tools menu and then click the New table button.

In the formula bar enter the DAX code below from Adam Saxton to create a Calendar table to use as a Date Table dimension.


DAX for Date Table (from Adam Saxton, Twitter: @awsaxton, Web: https://guyinacube.com/)

Calendar =

VAR _calendar =

CALENDAR (MIN(‘Confirmed'[Date]),





“Year”, YEAR ( [Date] ),

“MonthNumber”, MONTH ( [Date] ),

“Month”, FORMAT ( [Date], “mmmm” ),

“Quarter”, “QTR ” & FORMAT ( [Date], “Q” ),

“QuaterNumber”, FORMAT ( [Date], “Q” ),

“MonthYearNumber”, FORMAT([Date], “yy mm”),

“Month Year”, FORMAT([Date], “mmm yyyy”)



Should look like this for the Calendar table. Click the Checkmark to apply and create the table.

We now have a Calendar table!

Finally, we need to mark the Calendar table as a “date table”. Select the Calendar table in the right. Then in the Table tools menu select the Mark as date table.


Then in the pop-up window select the Date column for the Date column drop-down and click Ok.


Power BI Desktop – Modeling

Now we need to setup the relationships for our tables. To do this we use the Model section of Power BI Desktop. Click the Model button at the top left side to start modeling our table relationships.

Our tables will show up in a dark grey area where we can drag the table around to re-arrange them and create relationships. Drag the Header and Calendar tables into the center and place the Confirmed, Deaths and Recovered tables around them like below. NOTE: You will have to scroll around a little to move and re-arrange them as so.

Now that we have them arranged we can build the relationships. To do this we will use are keys, Date and CountryState, and drag from Calendar the Date field to the Date field in the 3 data tables Confirmed, Deaths and Recovered. We will repeat for the CountryState column in the Header table to the 3 data tables Confirmed, Deaths and Recovered. Below is an example of how to drag the columns.

When done are model should look like below.


Let’s click the Save button to save our work again!


Power BI Desktop – Reports

Now we are ready to build a report. Click the Report button at the top left side to start building our report!

In the Report window let’s first setup our Theme. Click the View menu at top and then you will see all the available pre-built themes you can choose from (like PowerPoint!). Click the light grey arrow to the right and then click the Storm theme (see below).

Now let’s add a title to our report. Click the Insert menu and then click Text box.

Position it as below and give it a title, COVID-19 Global Report, and set the font to 24.

Next, add a map to the report so we can show the occurrences of coronavirus by country. Click the Map visual icon in the Visualizations pane to add it to the report (make sure nothing is selected on the report before clicking the Map visual).

Position it at the bottom left.

Drag the Country/Region column in the Header table to the Location box and the Legend box in the Fields area.

Add the Value column from the Confirmed table to the Size box.

You map should look like this. There is one problem though! Since our data is values with running totals (see data section above) we get incorrect counts for the number of confirmed cases. Notice when we hover over China it shows China has over 3 million cases which is not correct. The reason is because the default aggregation method in Power BI is SUM.

To fix this we need to change the aggregation method for the Value column in the Size field to be Maximum. To do this we click the little down arrow to the right of Value in the Size field.

Then we choose Maximum.

Now when we hover over China, we get the correct value.

OK, now we will add a line chart to show the spread of the coronavirus by country by day. Make sure none of the items in the report are selected and choose the Line Chart visualization.

Position it as show below. Notice the read guide/snap lines that show up to help align it.

Drag the Date field from the Calendar table to Axis. Drag the Country/Region field form the Header table to Legend and drag the Value field from the Confirmed table to Values.

Notice the Date field automatically has a date hierarchy created and used in the visual.

In this instance we do NOT want a data hierarchy. So, to remove it, click the grey down arrow next to Date in the Axis field and choose Date.

Bazinga! We have a nice line graph showing the spread of the virus by country!

Now add two more Line Charts below this one and position as below.

In the left one add the same fields as the top Line Chart but replace the Values field for the Value field in Recovered.
In the right one add the same fields as the top Line Chart but replace the Values field for the Value field in Deaths.

Your report should look like this below now.


Now add some Slicers to the report to slice and dice the data by country or data. Make sure nothing is selected and click the Slicer visualization twice to add to Slicers to the report.

Position the slicers as such.

Select the left Slicer and drag the Country/Region field into the Field box.

Your Slicer will now be populated with the countries and regions. Let’s change this to be a dropdown. In the top right of the Slicer hover your mouse and click the very little light grey down arrow.

In the menu that displays select Dropdown.

Now in the Format section (the Paint Roller icon!) of the Slicer let’s change a few options.

In the Selection controls section let’s enable Show “Select all” option so we can make it easier to select countries in our slicer.

Now on select the right slicer and let’s setup a Date slicer. Drag the Date field from Calendar into the Field box.

You now have a date slicer to play with the dates in your report. Slide the date slider around and see what happens!

Finally, let’s add another Text box to show credit for the data source. Click the Insert menu and then click Text box and position below the title and add the following text and format as desired.

Data Source:
CSSE at Johns Hopkins University


You can use the Paint Roller icon on the visuals to change colors, titles, etc.!

For example, lets change the Title text of the Line chart for Confirmed cases. Select the top Line chart. Then select the Format (Paint Roller)
icon and scroll to Title. Change the Title text to Confirmed Cases by Country.

Repeat for the other two Line charts using the titles, Recovered Cases by Country and Deaths by Country.

Last thing, there is a Page name at the bottom left. Change this to COVID-19 by Country.

Click the Save icon at the top left to save your report!

Your final report should look like this now!!! Congratulations!!! You did it!



Power BI Service

If you have a Power BI Pro license you can use the Publish button at the top right to publish to the Power BI Service to share your report to your co-workers, your friends and even the world! You also can setup an automatic refresh of the data to in the service and a whole lot more. More info about the Power BI service is here – https://docs.microsoft.com/en-us/power-bi/service-get-started.



I hoped you enjoyed learning how to user Power BI and how to build your own report for the Novel Coronavirus (COVID-19). If you want to learn more about Power BI check out the official Power BI documentation – https://docs.microsoft.com/en-us/power-bi/. Also, follow Adam and Patrick on the Guy in a Cube YouTube Channel – https://www.youtube.com/channel/UCFp1vaKzpfvoGai0vE5VJ0w. It is the ultimate source of Power BI knowledge.

I have posted a Power BI file that should look exactly like the one you just built if you followed my instructions! You can follow me on Twitter @STATCowboy, connect up with me on LinkedIn at https://www.linkedin.com/in/jameyj/.

Finally, please be safe during this global pandemic. Wash your hands. Keep safe social distance and limit your interactions with groups until the curve is flat! (You can see in the report you just built that it isn’t flat for the rest of the world!

Guidance from CDC – https://www.cdc.gov/coronavirus/2019-ncov/index.html



GitHub Repo for Power BI Coronavirus Report Tutorial https://github.com/STATCowboy/Power-BI-Coronavirus-Report-Tutorial

Power BI dataflows – Enabling BI Professionals and Data Scientists

by Jamey Johnston, Sr. Data Scientist/Engineer, @STATCowboy

Why Power BI dataflows

Recently, I was reading an excellent article titled “Agile Digital Transformation for Data-Driven CEOs with Power BI” by Jen Stirrup (t|b) on why Power BI is a valuable tool for data driven CEOs to change the culture of their organizations. One of my favorite features to come to Power BI is Power BI dataflows and I believe it will improve the data management and usage in organizations. This article will discuss how dataflows can be used by BI professionals, data engineers and data scientist to improve their data management skills and build valuable insights for their companies.

I work with data scientist on a regular basis who spend a lot of time cobbling together data from different data sources to build models in Python using packages like Scikit-learn. These data scientist will often dump the data into CSV files and read them in as Panda dataframes or NumPy arrays to start the data science algorithm magic. This is messy! The data scientist often ask for new dumps of the data or losing the data dumps they have. I believe Power BI dataflows can really assist to make data management better for BI professionals, data scientist and data engineers.

What is Power BI dataflows

Power BI dataflows enable organizations to wrangle data spread across multiple data sources in the cloud or on-premises into Azure Data Lake Storage gen 2 in a Common Data Model. Dataflows are created and managed in workspaces in the Power BI service via an online version of the well-known Power Query interface. Data in Power BI dataflows can be leveraged by data scientist and data engineers in Azure services like Machine Learning (ML), Databricks and Azure Synapse Analytics because the data is stored in Common Data Model in Azure Data Lake Storage gen 2. Citizen Data scientist can take advantage of Microsoft Cognitive services or Azure Automated ML in dataflows. Microsoft also has Python packages like the Azure Datalake service client library for Python to connect to Azure Data Lake Storage gen 2 which can be used to connect to Power BI dataflows data to build data models in packages like Scikit-learn. Power BI reports can also connect to Power BI dataflows to build reports and dashboards.

Power BI dataflows Demo

Let’s look at Power BI dataflows. I have uploaded a CSV file of the Ames Housing Data
from Kaggle on my GitHub in a repository, https://github.com/STATCowboy/pbidataflowstalk, for Power BI dataflows.

To use Power BI dataflows you will need a Pro license of Power BI and to use some of the advanced features, like Cognitive Services or Automated ML, you will need Premium capacity. Your Power BI tenant admin and Azure admin will have to enable dataflows and connect to a Azure Data Lake Storage gen 2 to use the Power BI dataflows and integration with Azure Data Lake.

Once Power BI dataflows is setup, connect to your Power BI Service and create a workspace called “Ames Housing” and enable dataflows (under Advanced at the bottom) for the workspace.
Note, the Power BI service will manage the ACL (permissions) in the Azure Data Lake storage utilizing the users defined in the workspace automatically.

Now let’s import the CSV file of the Ames Housing data using dataflows. In your Ames Housing workspace click the “+Create” at the top and choose “Dataflow”.

Click the “Add new entities” button on the left panel on the page to open Power Query online to add the data.

In the Power Query Online screen click “Text/CSV” button to add the Ames Housing data into dataflows.

In the File path or URL textbox enter https://raw.githubusercontent.com/STATCowboy/pbidataflowstalk/master/AmesHousing.csvand then click the yellow “Next” button at the bottom right. Notice, you can connect to your On-premises data gateway if you have on-premises data.

In the Power Query Online editor remove all the columns except, “Gr Living Area” and “SalesPrice”, name the table “AmesHousingData” and click “Save and close”.

The next screen that pop-up is where we name our dataflow. Name your dataflow “Housing Data” and click “Save”. When it finishes saving click the “Refresh Now” button to populate your dataflow with data. Note, you can setup scheduled refreshes in dataflows including incremental refreshes.

You should have data in your dataflow stored in your Azure data lake now!

Power BI Desktop

Now that we have data in our dataflow let’s connect to it using Power BI desktop.

Open Power BI Desktop and click the “Get Data” button at the top or on the startup screen.

Select “Power Platform” on the left, then “Power BI dataflows” on the right and click “Connect”.

If this is the first time connecting to dataflows you will need to click the “Sign In” button and login to your Power BI Service. Once you login click “Connect”.

Once you are connected navigate to the “AmesHousingData” table and select it and click “Transform Data” or “Load” if you want and you will now have your Power BI dataflows data in Power BI Desktop to work with!

CDM Folders and Azure Data Lake Storage gen 2 (ADLS gen 2)

As mentioned earlier you can connect your Power BI dataflows to your own Azure Data Lake Storage and use it for data science or data engineering. The great thing that Microsoft did with Power BI dataflows is deploy the data into the Common Data Model in folders in Azure Data Lake Storage gen 2.

CDM Folders conform to “specific, well-defined, and standardized metadata structures and self-describing data. These folders facilitate metadata discovery and interoperability between data producers and data consumers”. (https://docs.microsoft.com/en-us/common-data-model/data-lake)

This CDM Folders have a metadata file, model.json, which follows the Common Data Model metadata format. The file contains information about the entities (table) and attributes, and links to underlying data files in the CDM Folder. The data files, currently CSV but support for other formats is in the works, are stored in folders for the entities.

If you look at the diagram below it shows the layout of the “Ames Housing” workspace and “Housing Data” CDM folder after it was written to the ADLS gen 2 storage. On the left is the view from Storage Explorer and on the right is the diagram of a CDM folder from the Microsoft Power BI dataflows documentation.

Storage Explorer

Python Demo to read CDM Folder for Data Science

Now that you have your data in ADLS gen 2 you begin to do the data magic! The azure-storage-file-datalake package was recently released to access hierarchical namespace enabled (HNS) ADLS gen 2 accounts. The below referenced Python demo will use this new package to access the CDM folders written by Power BI dataflows. It will also use the CDM helper file located here.

To run the demo follow the instructions here to setup a Conda environment using the below referenced YAML file to run the Python Demo below. Edit the Credentials.py file and update the accountName and credential variables to your values. Then run the AMES CDM Access Demo – Azure ML Python.py file to pull the data from entity, AmesHousingData, from the Ames Housing/Housing Data dataflow and run a simple OLS Linear Regression to predict SalesPrice from Gr Liv Area. You can run this in Visual Studio Code or Azure ML or any tool that can run Python like Jupyter Notebooks.

YAML file for Python Env

Python Demo

CDM Helper

Databricks Demo to read CDM Folder for Data Science

You can also access the data in powerful tools like Azure Databricks. Follow the instructions on this GitHub, by Ted Malone (t|b) to configure you Databricks to access your Power BI dataflows in CDM Folders.

Databricks Setup

Sample code to connect to the Ames Housing Data you loaded in your Power BI dataflows in the demo above is in my GitHub.

Sample Code


I hope this article helped to introduce and showcase some of the great features of Power BI dataflows. To learn more about Power BI dataflows check out the official Microsoft documentation here. Also, I recommend following Matthew Roche (t|b) who works for Microsoft and evangelizes Power BI dataflows. Finally, follow me on Twitter @STATCowboy, connect up with me on LinkedIn at https://www.linkedin.com/in/jameyj/ or check out my blog, https://www.statcowboy.com to keep in touch.

GitHub Repo of Code

Azure Virtual Machine (VM), Azure Bastion, iPad Pro, Surface Arc Mouse … ultimate portability!

Azure VM on iPad

I really wanted to JUST carry my iPad around at conferences. The problem was that I often needed a Windows O/S “machine” to complete task like code Python in VS Code or remote into my office computer.

So, what to do. The solution was simple I thought … use an Azure Virtual Machine and the Remote Desktop (RDP) app from Microsoft on my iPad. However, that was not an ideal solution as you really need a mouse to work well on Windows and the iPad does not support a mouse … or at least it USED to not support a mouse. Enter iPadOS 13! With iPadOS 13 you can now use a Bluetooth mouse (like the Surface Arc Mouse, the one I use!) with your iPad. Now I have a solution I can use.

Read this article from 9to5 Mac, How to use mouse with iPad, to learn how to setup your mouse with you iPad. Next, read the rest of this blog post to setup an Azure Virtual Machine (VM) and configure it for use with your iPad … or any other device that support RDP or just has a browser (we can use Azure Bastion to connect via a browser … but at a small cost[$] that we will discuss later).

This blog will show you how to setup and configure the following in Azure and on your iPad (or a PC, Mac or Android table) to compute anywhere using Azure:

Setup Azure Virtual Machine

  1. Navigate to the Azure Portal (https://portal.azure.com) and click the “Create a resource” button at the top of the page.
  2. In the Azure Marketplace click “Compute” then click “Virtual Machine” to begin the wizard to create a Virtual Machine.
  3. In the Virtual Machine Wizard select the desired subscription you desire to create the VM and select the Resource Group to create the VM inside. If you don’t have a resource group already, click “Create new” under Resource Group to create a new Resource Group. Enter desired name in pop-up for Resource Group and click “OK”.
  4. Under Instance details enter your desired VM name, pick your region, your O/S (in my case just want the latest Windows 10 Pro build – 1809) and your VM size (click “Change size” to select size) for your VM. For demo purposes I will pick a small VM size. You should pick the appropriate VM size for your needs. Note you can shutdown and stop the VM when you are not using it to save cost. So, the monthly cost can be 1/3 less for example if you are only using it 8 hours a day and stopping the VM when not in use! The good news is you can change the size after you create the VM if you need a bigger size later on – Resize Virtual Machines.
  5. Under Administrator account enter a username from the admin user and create a password. Under Inbound port rules select “Allow selected ports” and choose RDP (3389) to expose the RDP port to allow the iPad RDP client to connect.
  6. Click “Next : Disks >” and leave the defaults on the Disk page and click “Next : Networking >”
  7. Configure the Virtual Network by leaving the default name and clicking “Create new”.On the Create virtual network screen leave the default name and fill out the information like below (you can choose you own ranges but make sure to have one with mask 24 for RDP/HTTP and one with mask 27 for Azure Bastion) for the Address spaces and Subnets. Note we create two address spaces and subnets. One is the default with mask 24 for use with RDP and the other is with mask 27 used for Azure Bastion. The name of the Azure Bastion subnet must be AzureBastionSubnet. Click “OK” when done filling out the addresses.

  8. You can leave the Subnet setting to the default. Click “Create new” under Public IP and select “Static” under Assignment (This will allow the IP address to stay the same so you can easily connect to it each time for RDP). Leave the RDP (3389) setting and click “Next : Management >”.
  9. At the bottom of the Management section you may want to configure Auto Shutdown to assist in saving money by having the VM shutdown each night at a specific time. You can configure it to email you before the VM shutdowns as well. Make the desired setting here and click “Next : Advanced >”.
  10. You can take the defaults on the Advanced and Tags pages and go Review + Create page. Review all your settings and make sure they are correct. Make sure you have the green
    bar at top with “Validation passed” and click “Create” to create your VM.

  11. The next screen will show the status of you VM deployment. Watch this screen until you get “Your deployment is complete”. Then click “Go to resource”.
  12. On the VM resource page click “Connect”.
  13. On the RDP page you will find the Public IP of your VM and you can also click the “Download RDP File” to download an RDP file to connect from your PC. You will want to jot down the Public IP address (I have masked mine slightly for security in the blog post) for use on your iPad or other devices which support RDP.
  14. To use Azure Bastion which enables connectivity from a browser, click the Bastion at the top and then click “Use Bastion”. On the next screen you can leave the default settings as we already configured the AzureBastionSubnet in the VM setup and click “Create”. You will get a “Creating a new bastion <name>.” message. Wait until you get the “Provisioning State: Succeeded” message with the login and then login and test Bastion.NOTE: Azure Bastion at the time of this blog post is about $140/month plus network charges (first 5GB is free). I just use RDP personally, but I wanted to demonstrate how to setup Azure Bastion as it is a great service for organizations that want a more secure connection that utilizes just a browser and https. Azure Bastion pricing.

  15. When Azure Bastion connects you will see a pop-up to allow to send text and images to the clipboard when you connect. Click “Allow” to enable the clipboard for your VM (Cut/Copy/Paste).
  16. You are now connected to your new Azure VM! On the privacy screens you can take the defaults (or change if you like) and click “Accept”.
  17. After that it will prompt to see if you want your machine discoverable on the network. I picked “No” as I just have the one VM.
  18. And you are in you VM. Install the software you want to use – VS Code, Miniconda, Docker, … Whatever you like! You have a Windows 10 Pro PC in the cloud!

Configure RDP on iPad

  1. Install the Microsoft Remote Desktop (RDP) client on your iPad to connect to Azure VM from iPad. Launch the RDP Client, “RD Client”, to start the Microsoft Remote Desktop client.
  2. When launched click the “+” at the top right to start a new RDP connection to your Azure VM.
  3. Select “Desktop” on the “Add New” screen that pops up.
  4. In the “Desktop” screen select the “PC Name” section to enter the IP for your Azure VM. In the “Add Remote Desktop” screen the pops up enter the Public IP address for your Azure VM and click “Done”. (See Step 13 above in the Setup Azure Virtual Machine section).
  5. Click the “Additional Options” entry box and click “Friendly Name” in the “Additional Options” screen to enter the name of your VM. Enter a friendly name from your VM and click “Done”.

  6. Click “Desktop” on the next screen and then click “Save” to save the RDP connection in the RD Client.
  7. In the main window of the Remote Desktop client click the new connection you made to connect to your Azure VM.
  8. Enter the username and password for you Azure VM and optionally select the “Store username and password” option and then click “Done”.
  9. You should successfully be connected to your Azure VM if you entered everything correctly!!! If you want to use your mouse make sure you configured your iPad for the mouse using instructions from 9to5 Mac, How to use mouse with iPad.When you are ready to end your RDP session you click the button in the top middle with the three bars and then click “End Session”. To save money be sure to “Stop” your VM if not using it. If you configured the auto shutdown when you configured your VM it should automatically shutdown at the time you specified each day if it is up and running at that time.

Stop Azure VM when Not in Use to SAVE MONEY!

In the Azure Portal click the “Stop” button at the top to “Stop” the Azure VM when not in use.

When you are ready to use the Azure VM again, login to the Azure Portal and select the VM and click “Start”. Then you can reconnect.

I hope you enjoyed this blog and I hope you enjoy your Azure VM and iPad with a mouse as much as I enjoy using mine!

From PASS Summit Attendee to Microsoft Data Platform MVP

For the past 25 years I have worked with data. Data in Oracle. Data in MS SQL Server. Data in GIS. Data in the cloud. Data everywhere! To keep my skills up I attended an Oracle conference every year up until 2012. Either the HOTSOS conference in Dallas or OracleWorld in San Francisco. Why Oracle conferences? Well, it is where I spent most of my time managing and using data, with Oracle RDBMS. I worked with MS SQL Server as well, but I spent most of it in Oracle. (I have been an Oracle Certified Professional since 2001.)

In 2012 the company I work for made a shift to adopt more of the Microsoft data technologies going forward. So, six years ago, I attended my first PASS Summit in Seattle in November 2012. In less than 2 months I will attend my seventh PASS Summit. Why, because it is the best technology community I have seen out there. Don’t get me wrong I made some great friends in other tech communities and learned a great deal. However, the #SQLFamily, as it is called in the PASS community on Twitter, is a great one. My first memorable encounter with the #SQLFamily was my first day at PASS Summit in 2012. It was a Pre-Con day and I was with my co-worker and friend at lunch. We were looking for a table to sit at and when we found a table I met my first three friends in the PASS Community, Cher (t), Melissa and Jeremy (t). I still hang out with them at various PASS events like SQL Saturdays and PASS Summit and we still joke about how Melissa said that I had to go get her lunch if I wanted to sit at the table (which I did, salmon, I think!). In 2012, I learned that the sessions at #PASSSummit were presented by individuals who were extremely talented, personable and they were able to take deep technical topics and present them in an easy to understand manner. I was hooked on making PASS Summit a part of my yearly conference schedule.

In 2013, PASS Summit was in Charlotte, NC, and I learned for the first time about the Microsoft MVP program and made my first MVP friend, Jason Horner (t). Jason was instrumental in coaching me over the next 4 years on how to give back to the Microsoft community and become a Microsoft MVP. Jason really became the guy that inspired me to become an MVP, not for the nostalgia of being an MVP but for the reward that comes from giving back to the community. I didn’t know in 2013 that I wanted to work hard to become a MVP, but I met so many awesome people that were MVPs that over the next few years I wanted to be one of them and give back. I met some other great friends in 2013 that I see often at SQL Saturdays and PASS Summit, Mickey (t), Martin (t), Cathrine (t), Maria (t) and Bob (t).

Over that next few years I added to my list of PASS friends, Wendy (t), David (t), Chris (t), Amy (t), Wolf (t), Naomi (t), John (t) – nominated me for MVP!, Reeves (t) – <3 you!, Ginger (t), Alan (t), Nancy (t), Joe (t), Rebecca and Devon (t) – the last 5 I have co-organized SQL Saturday Houston the last few years. I have met many others too (Angela, Lance, William, Rie, John, Karla, Mindy, Brandon, Tamera, Tim F., Stacia and so many more … I really could type 100 or more names …)!

Fast forward a couple of years to PASS Summit 2015 and my first time to speak at PASS Summit. I was invited by Microsoft to speak on the new Security Features in SQL Server 2016. It was such a rush and the real start of my involvement as a PASS Community Speaker which was crucial in me getting awarded a MS Data Platform MVP. I have spoken over the last three years at 3 PASS Summits, 30 or so SQL Saturdays, several Universities and other conferences including my first international conference in London earlier this year.

It has been a great journey to become a Microsoft Data Platform MVP that started in 2012 at PASS Summit! Truly, though, the best thing to come out of my attendance of PASS Summit is the lifelong friendships I have gained that have helped me to become a better person and data professional. I encourage you to attend PASS Summit and meet new people and learn new data skills. I, also, challenge you to give back and maybe one day speak at PASS Summit and who knows become a Microsoft MVP!

– Jamey

Register before Sept 22 for #PASSSummit & be entered into a daily drawing for PASS merchandise & the grand prize drawing, PASS Summit 2019! Save $200 with Code: PASSITON


Row Level Security in Power BI using Parent/Child Hierarchies


Row Level Security in Power BI using Parent/Child Hierarchies

Jamey Johnston (@STATCowboy)



I wrote a blog post on using Parent/Child Hierarchies to enforce Row Level Security (RLS) in SQL Server 2016+. The demo in that article used a parent/child hierarchy for an organization combined with a flattened asset hierarchy to enforce strict row level security on a fictional Oil & Gas dataset for a fictional Oil & Gas organization. I recommend reading the “Overview of the Demo” and the “Asset and Organization Hierarchies and RLS” sections in that blog post before proceeding with this blog post. It will give you a good understanding of what hierarchies entail and how they are used in this demo/blog post.

We will only focus on the Organizational Hierarchy and how they can be used to enforce RLS in Power BI in this blog post. Think about creating a HR report and have it automatically filter the person logged into the report and all direct reports under that person. Later we will bring in the Asset Hierarchy in a follow up blog post soon and combine the organizational and asset hierarchies to enforce a more complex RLS scheme as used in the MSSQL demo linked above.


of the demo code and Power BI file.

Scripts and Explanation

There are 2 scripts to run to install the entire demo (and you just run them in the order they are number, 1 – 2). They are located in the oilgaspbidemo-SQL-Code folder in the GitHub. If you have used the ones for the RLS demo in SQL Server they are similar but create a different database (oilgaspbidemo) and use just a subset of the tables:

1 – Oil&Gas PBI Demo – BuildTables.sql

The script will create the database, oilgasrlsdemo2016, and the tables needed for the demo

The Tables are as such:

  • ASSET_HIERARCHY – Table contains the asset hierarchy and is used to build the demo data for the asset hierarchy in the Well_Master table.
  • DATES – Generic table to hold a date dimension for use later in an Excel Power Pivot model.
  • SEC_ASSET_MAP – Table contains the entries mapping the organization units to the asset hierarchy levels for access to the data. The table would be managed by the security team for the application. Users and their subordinates are denied access to data via an entry of ou, ‘NONE’, ‘NONE’ or granted all access via an entry of ou, ‘ALL’, ‘ALL’.
  • SEC_ORG_USER_BASE – Table contains the employees including the employee to manager parent/child relationship to build an organization hierarchy and the organizational unit id for mapping to asset hierarchy levels for security. This table would likely be fed from an HR system. Also, will demonstrate Always Encrypted in this table.
  • SEC_USER_EXCEPTIONS – Table contains entries to allow for overrides of the organization hierarchy based model. Any employee entered here will use permission defined in this table instead of what is inherited from the organizational hierarchy.
  • WELL_DAILY_PROD – Contains COMPLETELY made-up and randomly generated daily well production data for Oil, Gas and NGL. Primary key is WELL_ID and RLS is achieved by using the asset hierarchy in the WELL_MASTER table to filter the WELL_IDs. This is a Clustered ColumnStore Indexed table.
  • WELL_MASTER – Contains COMPLETELY made-up and randomly generated master well data including the made up asset hierarchy. This is the main business table used for RLS for ALL well tables.
  • WELL_REASON_CODE – Contains COMPLETELY made-up and randomly generated daily well downtime data for Oil, Gas and NGL. Primary key is WELL_ID and RLS is achieved by using the asset hierarchy in the WELL_MASTER table to filter the WELL_IDs.


2 – Oil&Gas PBI Demo – LoadTables.sql


This script is used to load or generate the demo data including user and security tables and hierarchy and well data. There are two parameters close to the top that can be used to specify the amount of data to load – @wellcount and @drillyearstart. @wellcount specifies the number of wells to generate and @drillyearstart specifies the first possible year to use for a well. The start date for a well will be randomly selected between @drillyearstart and the current year the script is run.

There is also a Power BI file in the GitHub, Oil and Gas Power BI RLS Demo.pbix, to see the setup of the roles, DAX, tables and demo the hierarchies.


RLS in Power BI

So how do we accomplish Row Level Security in Power BI using a parent/child organizational hierarchy? First, we need to know the basics of RLS in Power BI. Let’s start with capturing the username of the currently connected user.


The two DAX functions, USERNAME() and USERPRINCIPALNAME(), are the key to RLS based security and organizational hierarchies. It is the way we determine who is logged into the PowerBI.com Service and thus use that information (login name) to apply row level security. USERNAME() function returns the domain name and username for the currently connected user as DOMAIN\username where as the USERPRINCIPALNAME() function returns as username@domain.com.

Good article from Adam Saxton, Guy in a Cube GH|B|T|YT, on USERNAME() and RLS – https://powerbi.microsoft.com/en-us/blog/using-username-in-dax-with-row-level-security/. Might be worth a read before continuing with this article


RLS Roles

Next, we need to understand Roles in Power BI with RLS. Take a look at this article from Adam Saxton – https://docs.microsoft.com/en-us/power-bi/service-admin-rls – and then head back to this article …

Now that we understand RLS basics in Power BI let see how to implement RLS in Power BI using a Parent/Child relationship in an organization hierarchy.


Organizational Hierarchy

So, look at the following Organizational Hierarchy in the demo code described above. Let’s look at Joachim Hapitash in the organizational hierarchy. Under him are Roger Wagner and under Roger are Tracey and Sean. Now let’s say when Roger Wagner logs into the PowerBI HR Report we want to use the hierarchy to only show Roger his record and his employees under him in the hierarchy. How do you accomplish this? Keep reading and I will show you!



Parent/Child Hierarchy in SQL Server

Now let’s examine the data for the SEC_ORG_USER_BASE table created in the demo. The table uses a unary relationship to relate the employee to his/her manager in the table. Look below and you will see that the CEO, Dave Campos, has NULL for MGRID. That is because he has no manager. But if you look at Joachim Hapitash you will see that he has 1001 for MGRID as Dave Campos, EMPLID = 1001, is his manager.



Now let’s use a recursive CTE to see how to relate the employee and managers together. If you run the following query against the SEC_ORG_USER_BASE table, you will get Roger and all the employees under Roger in the hierarchy.


WITH DirectReports




SELECT e.MGRID, e.EMPLID, e.NAME, cast(null as varchar(50)) as MGR,


0 AS Level,

     cast(e.ORG_UNIT_ID as varchar(max)) as ORG_LINEAGE,

     cast(e.ORG_UNIT_NAME as varchar(max)) as ORG_NAME_LINEAGE,

     cast(e.NAME as varchar(max)) as NAME_LINEAGE





Level + 1,

        case when e.org_unit_id <> d.org_unit_id then cast(d.ORG_LINEAGE + ‘|’ +

cast(e.ORG_UNIT_ID as varchar(max)) as varchar(max)) else cast(d.ORG_LINEAGE as varchar(max)) end,

        case when e.ORG_UNIT_NAME <> d.ORG_UNIT_NAME then cast(d.ORG_NAME_LINEAGE + ‘|’ +

e.ORG_UNIT_NAME as varchar(max)) else cast(d.ORG_NAME_LINEAGE as varchar(max)) end,

        case when e.NAME <> d.NAME then cast(d.NAME + ‘|’ + e.NAME as varchar(max)) else cast(d.NAME as varchar(max)) end


INNER JOIN DirectReports AS d




FROM DirectReports

WHERE NAME_LINEAGE like ‘Roger Wagner%’;


The results are as such:


Power BI, RLS and Organizational Hierarchies

Next, we add the SEC_ORG_USER_BASE table into Power BI and add some calculated columns (Path, Level 1, Level 2, Level 3, Level 4, Level 5, Level 6, and HierarchyLevel) and one measure (MaxNumLevels). In this demo we will only look at the SEC_ORG_USER_BASE table. In the next blog article, we will look at all the tables in the demo!



Below is the DAX for the calculated columns and measure:

  • Level 1 = PATHITEM(SEC_ORG_USER_BASE[Path],1)
  • Level 2 = PATHITEM(SEC_ORG_USER_BASE[Path],2)
  • Level 3 = PATHITEM(SEC_ORG_USER_BASE[Path],3)
  • Level 4 = PATHITEM(SEC_ORG_USER_BASE[Path],4)
  • Level 5 = PATHITEM(SEC_ORG_USER_BASE[Path],5)
  • Level 6 = PATHITEM(SEC_ORG_USER_BASE[Path],6)
  • MaxNumLevels = MAX(SEC_ORG_USER_BASE[HierarchyLevel])

The Path column give a pipe, “|” separated list of the hierarchy at that node.
The Level X columns give the EMPLID of the LEVEL X employee at that level in the hierarchy. You will use these columns to build a hierarchy for the report.
The Level X Name columns give the NAME of the LEVEL X employee at that level in the hierarchy. You will use these columns to build a hierarchy for the report.
The HierarchyLevel column gives the depth of the organizational level of the employee.
The MaxNumLevels measure calculates the max levels in the hierarchy. Used to determine the number of Level X columns to create. Don’t filter any employees and it will show “6” in this demo if you add a table with just the measure.

These DAX statements use the Parent and Child DAX functions described here – https://msdn.microsoft.com/en-us/library/mt150102.aspx.


Now let’s setup a hierarchy in the Power BI model so we navigate the organizational hierarchy in visuals. This is accomplished by right-clicking on “Level 1″in the Fields section on the far right and choosing “New hierarchy”.


This will create a new hierarchy called “Level 1 Hierarchy”. Let’s rename it to “Org Hierarchy” by right-clicking (or clicking the ellipses – three dots – to the right of the name) and choosing “Rename”.


To add the remaining levels to the Hierarchy, starting with Level 2 and ending with Level 6 right-click and select “Add to Hierarchy -> Org Hierarchy” for each level until they are all in the hierarchy.



We can hide “Level X” columns now that we have a hierarchy defined. Right-click on each “Level X” field and choose “Hide”.


We will repeat the above for the “Level X Name” columns and call the hierarchy “Org Hierarchy Names”.


Now let’s setup the Row Level Security. To do this we will setup a role called “Org Security”. Under the “Modeling” menu choose “Manage Roles”.


When the Manage roles dialog pops up create a new role called “Org Security” for the table “SEC_ORG_USER_BASE” and use the following DAX expression as the Table filter and “Save”.


The DAX expression is doing the following:

  • Uses the PATHCONTAINS DAX function to search the Parent/Child hierarchy (by using the PATH DAX function) to find the employee record and all employee records under that employee.
  • Uses the LOOKUPVALUE DAX function to convert the logged in USERID to the EMPLID.
  • Uses the PATHITEM(SUBSTITUTE(USERPRINCIPALNAME(), “@”, “|”), 1)) DAX expression to convert the USERPRINCIPALNAME() (e.g. mcox@poweroilandgas.com) to the USERID (e.g. mcox) and feed the LOOKUPVALUE DAX function to get the EMPLID to feed the PATHCONTAINS DAX function.


Now let’s test the security! There is a “Matrix” visual added to the PBIX file in the downloads. It has the “Org Hierarchy Names” added to it as well as a column with counts by “EMPLID”. This will show us the employee counts at each level of the hierarchy in the organization.


Now let’s use the “View as Roles” to test! Click the “View as Roles” button under the Modeling menu.



Then enter mcox@poweroil-n-gas.com for “Other user” and check the “Other user” box. Also, check the “Org Security” box to enable the “Org Security” role.



Now check the Matrix visual! It should be filtered to only “mcox” and employees under “mcox”. Notice how we only see 119 employees in the dataset now and NOT 157!

Now we need to add a HR Role to allow HR personnel the ability to see all employees and not limited to just their organizational units. This is easy to do. Just add a role called “HR Security” and leave the filter blank.

Now when you publish to the PowerBI.com service you would assign the HR personnel to the “HR Security Role” and all other personnel to the “Org Security” role.

Take a look again at the article from Adam Saxton – https://docs.microsoft.com/en-us/power-bi/service-admin-rls#manage-security-on-your-model – to see how to assign the users/groups to the Roles in the Power BI service and to test it in the Power BI service.

Download Code!

Loop Through Dates in T-SQL … One Day at a Time!


T-SQL Code to Loop One Day at a Time

Jamey Johnston (@STATCowboy)


Hidden in my SQL Server 2016 Security Demo blog post is a neat T-SQL trick to loop through a date range day by day (check out the “2 – Oil&Gas RLS Demo – LoadTables.sql” script when you download the code)! But to give you a simpler example the gist of the code in the script is this:


DECLARE @StartDate DATE = ‘2016-10-01’;
DECLARE @EndDate DATE = ‘2016-10-31’;

WHILE (@StartDate <= @EndDate)


print @StartDate;

— Do Something like call a proc with the variable @StartDate

set @StartDate = DATEADD(day, 1, @StartDate);


Download code!

As you can see this is simple code and works like a charm! Hope you enjoy and hit me up on Twitter (@STATCowboy) if you have questions or improvements to the T-SQL Code!

My Journey to Data Science – Part 3 of 3

My Career Journey Over the Last 20+ Years

October 28, 2016


Jamey Johnston (@STATCowboy)


This is Part 3 (the final post!) of a 3-part Blog series on my Career Journey the Last 20+ years into my current role as a Data Scientist. Part 1 can be found here and Part 2 can be found here!



BI is fun! I loved it and soon was not missing the late-night pages of down databases or late nights doing database upgrades. (Although, one night I upgraded 11 Oracle production databases to Oracle 11g at the same time which I must admit was fun!) I loved the insights I gained from building BI solutions using MS products like SSAS, SSRS, MSSQL and SharePoint. I also loved the new-found freedom of exploring data and making it all work together in tools like Spotfire and Power BI! I was building large analytic datasets with billions of rows and special analytic calculations. I was having fun but I was missing something to truly fill the gap as a data scientist, statistics!


TAMU MS of Analytics


I was driving late one night into the office on I-45 in Houston which was not my normal route and I see this billboard for Texas A&M Masters of Science in Analytics! I had been looking for a program to learn statistics but I wanted it local (my ADHD doesn’t do well with online classes!). This was March of 2014. I was accepted into the program in June 2014 and started classes the end of August 2014! To say the least, the next two years my life was crazy! I had started to learn R the last year or so and in the program, I learned about SAS and SAS JMP and a ton of statistics from regression, time series, spatial statistics, multivariate analysis to machine learning! I was having a blast learning statistics and all my friends and family thought I was plum mad to start grad school in my 40s and in all things, Statistics!

I can’t emphasize the importance of a good education in statistics and how valuable the MS in Analytics from TAMU was in that education. I graduated in May 2016 with my Masters and could not be happier with my experience at TAMU. I am proud to be a member of the Fightin’ Texas Aggie Class of 2016! Yes, I wore Mickey Ears with my Twitter handle (@STATCowboy) on the back at graduation!



Data Scientist


Armed with my learning of R, SAS and more importantly, Statistics, I am fully ready for whatever data is thrown at me. I get to build some really cool statistical and machine learning models these days and I can’t believe at 42 years of age I have a new career I basically started at 40! I guess if you are going to have a midlife crisis a career change to The Sexiest Job of the 21st Century according to the Harvard Business Review is a good crisis! Who knows what career I will have at 50?!

I’ll finish with my definition of a Data Scientist – someone who loves finding insight from data, big or small! In some ways, we are all data scientist and I would encourage anyone who would like to take their data knowledge to the next level to learn Statistics. Statistics is the first leg of a three-legged stool that makes up data science. The second leg is a data guru! Someone who can store, move and manage data and present it in an easy to use fashion that can be consumed quickly. The final and third leg of the stool is a Subject Matter Expert (SME) who knows the business and the data around the business. One person is never an expert in all of these but a Data Scientist is typically an expert in one and very proficient in the others. However, I find the best Data Scientist is really a team of folks that love data and bring expertise in one of the three areas – statistics, data or subject matter expert – to the table to build models to make business better!

To get you started in your career as a Data Scientist check out my presentation from SQL Summit 2016 here. Be sure to check out the resources section of the PPT to see some good locations to learn things like R! Also, feel free to reach out to me on Twitter if you have questions along your journey. I am happy to help!

I hope you enjoyed reading about my journey. I know have enjoyed living it!

Code Like a Pirate … Intro to R and Data Science Tools in MS

So, you want to code like a Pirate! – R … R … R!

SQL Summit 2016 Version!
October 28, 2016


Jamey Johnston (@STATCowboy)


Thanks for all who attended my session at SQL Pass Summit 2016. The link to the scripts, documents and presentation materials from the session is below. If you have any questions or comments, feel free to hit me up on Twitter (@STATCowboy). I really appreciate all of you who attended and sorry if I got to statistically on you!


Download from OneDrive!


My Journey to Data Science – Part 2 of 3

My Career Journey Over the Last 20+ Years

October 26, 2016


Jamey Johnston (@STATCowboy)



This is Part 2 of a 3-part Blog series on my Career Journey the Last 20+ years into my current role as a Data Scientist. Part 1 can be found here!



I had been working for LDEQ for three years now and was one year removed from college when a good friend called. He was working as a consultant for Oracle Corporation in Houston. Oracle had signed a big contract with a client in Houston and needed DBAs on the project. He told me to send my resume so I could join him which I did! Two weeks later I was a Senior Consultant with Oracle Corporation working and living in Houston. It was a very exciting time of my life those three years. I had one of the best DBA mentors you could have, Clement C., teaching me about Oracle and technology and was getting to travel to some cool places in the US. Most importantly, though, I found the love of my life soon after moving to Houston, my wife, Melanie!

Oracle was a great 3 years for me and I learned so much that truly set the stage for the next 10 years or so of my life as a DBA. I enjoyed Oracle but after 3 years of travelling it was time to stop traveling and get a local job in Houston so I could spend more time with my wife and dogs! My wife will tell you I only stopped traveling for the dogs (not true though)!

This little-known company at the time called Enron (Ha Ha!) was hiring and I got hired on there! It was June of 2001 and Enron couldn’t be flying any higher (at least in their make-believe world!). We were one of the 10 largest companies (at least on invisible paper) in the world and the talent coming to work there was second to none. Over the next year and half I would learn a great deal about business as well as management as I was soon promoted to manage one of the core DBA teams at Enron. Enron went bankrupt and I moved to work for UBS when they bought the North American trading division of Enron.

After a brief stent at UBS they decided to get out of Houston and I went to work for a small consulting firm, Methods Technology Solutions, out of Baton Rouge. I did some travelling again and really had a lot of freedom to learn some new things and work on a variety of database projects from very small projects to bigger ones. I worked for Methods for 4 years, the last year and half consulting at a large independent Oil and Gas company. The Oil and Gas company offered me a full-time employment job and I have been there now for over 12 years.

The first 7 years at the O&G company I was a DBA team member working on admin, upgrades and a lot of tuning. It was fun! I had some time to write a little code, too! I created an end-user web dashboard for the app owners to see what was happening in the DB with their systems in C# ASP.NET. I created a multi-threaded application in C# to rapidly put together archived emails in to a bundle and create audit trails in XML, CSV, TXT and DB tables. However, I mostly did the typical enterprise DBA tasks.


Tuning SQL to Enhancing Business Processes and Relationships


I really enjoyed tuning database systems. I was a master of 10046 traces in Oracle and SQL Profiler and Extended Events in SQL Server and was able to tune a great deal of our applications. The funny thing about tuning databases when you are a DBA is you tend to look for that magic parameter, trace flag, bad SQL, etc. and often forget about the application and business processes behind what you are trying to optimize. More importantly, you often forget about the business people using those applications and how that data is used to add business value.

Anyone who knows me knows I am not shy! I love people and I love engaging with people. This led me to have a different approach to tuning. When I would get a call that something was slow I would find the business person who was having an issue, schedule an hour of their time in their office, have them reproduce the issue in the system while I watch both what they were doing in the application and also what was happening in the database. It was funny to see their face when the “DBA” would show up in their office. It was like an endangered species had left their native habitat and ended up in their office. Many would even tell me I have never seen a DBA before! This approach created a special relationship that allowed me to start understanding the business processes behind my company and fostering relationships to do more than just “tune database systems”.

I was put on a special project to implement a new business system and was actually placed for a year within the business team to assist with implementing the application. The application is a critical one for O&G companies and so I was dedicated to just working on that project and application. For the first time in almost 15 years I wasn’t a DBA! I wasn’t on-call (I didn’t know what to do with my new-found freedom). I didn’t have to do database upgrades. It was so weird to actually be a “customer” of the DBA team! It certainly gave me insight into how it felt to be on the other side of the table. I absolutely loved the project. I spent so much time understanding all the various data pieces and how they would flow into the new system (we had several different DBs that would need to integrate into this new system) and more importantly how that data truly worked for this particular part of the business. I really loved data! I realized then that I didn’t really want to be a DBA full-time (still like the part-time idea just to keep current but not very practical) but I wanted to work with data. I wanted to build datasets and analyze them. I wanted to find ways to better connect them together. I wasn’t quite on the statistics track yet but definitely the data and business side of Data Science.




As I began to understand the applications and business processes behind using the databases I supported, I also became aware of how so many of the systems were connected to one another either by ETL, database links, linked servers or virtually through various business processes. The connections of all the systems and more importantly, for me, the data led me on my next journey into BI.

I had become exposed to a tool called Spotfire while I was on the special project and learned how to use the tool and how people at work were using the tool. So, I began to work on connecting Spotfire directly and securely to the data sources and soon I was moved onto the BI team working on BI projects in Microsoft and Spotfire. I was a BI guy now!


Stayed tuned for Part 3 tomorrow – BI to Data Scientist!