From PASS Summit Attendee to Microsoft Data Platform MVP

 

For the past 25 years I have worked with data. Data in Oracle. Data in MS SQL Server. Data in GIS. Data in the cloud. Data everywhere! To keep my skills up I attended an Oracle conference every year up until 2012. Either the HOTSOS conference in Dallas or OracleWorld in San Francisco. Why Oracle conferences? Well, it is where I spent most of my time managing and using data, with Oracle RDBMS. I worked with MS SQL Server as well, but I spent most of it in Oracle. (I have been an Oracle Certified Professional since 2001.)

In 2012 the company I work for made a shift to adopt more of the Microsoft data technologies going forward. So, six years ago, I attended my first PASS Summit in Seattle in November 2012. In less than 2 months I will attend my seventh PASS Summit. Why, because it is the best technology community I have seen out there. Don’t get me wrong I made some great friends in other tech communities and learned a great deal. However, the #SQLFamily, as it is called in the PASS community on Twitter, is a great one. My first memorable encounter with the #SQLFamily was my first day at PASS Summit in 2012. It was a Pre-Con day and I was with my co-worker and friend at lunch. We were looking for a table to sit at and when we found a table I met my first three friends in the PASS Community, Cher (t), Melissa and Jeremy (t). I still hang out with them at various PASS events like SQL Saturdays and PASS Summit and we still joke about how Melissa said that I had to go get her lunch if I wanted to sit at the table (which I did, salmon, I think!). In 2012, I learned that the sessions at #PASSSummit were presented by individuals who were extremely talented, personable and they were able to take deep technical topics and present them in an easy to understand manner. I was hooked on making PASS Summit a part of my yearly conference schedule.

In 2013, PASS Summit was in Charlotte, NC, and I learned for the first time about the Microsoft MVP program and made my first MVP friend, Jason Horner (t). Jason was instrumental in coaching me over the next 4 years on how to give back to the Microsoft community and become a Microsoft MVP. Jason really became the guy that inspired me to become an MVP, not for the nostalgia of being an MVP but for the reward that comes from giving back to the community. I didn’t know in 2013 that I wanted to work hard to become a MVP, but I met so many awesome people that were MVPs that over the next few years I wanted to be one of them and give back. I met some other great friends in 2013 that I see often at SQL Saturdays and PASS Summit, Mickey (t), Martin (t), Cathrine (t), Maria (t) and Bob (t).

Over that next few years I added to my list of PASS friends, Wendy (t), David (t), Chris (t), Amy (t), Wolf (t), Naomi (t), John (t) – nominated me for MVP!, Reeves (t) – <3 you!, Ginger (t), Alan (t), Nancy (t), Joe (t), Rebecca and Devon (t) – the last 5 I have co-organized SQL Saturday Houston the last few years. I have met many others too (Angela, Lance, William, Rie, John, Karla, Mindy, Brandon, Tamera, Tim F., Stacia and so many more … I really could type 100 or more names …)!

Fast forward a couple of years to PASS Summit 2015 and my first time to speak at PASS Summit. I was invited by Microsoft to speak on the new Security Features in SQL Server 2016. It was such a rush and the real start of my involvement as a PASS Community Speaker which was crucial in me getting awarded a MS Data Platform MVP. I have spoken over the last three years at 3 PASS Summits, 30 or so SQL Saturdays, several Universities and other conferences including my first international conference in London earlier this year.

It has been a great journey to become a Microsoft Data Platform MVP that started in 2012 at PASS Summit! Truly, though, the best thing to come out of my attendance of PASS Summit is the lifelong friendships I have gained that have helped me to become a better person and data professional. I encourage you to attend PASS Summit and meet new people and learn new data skills. I, also, challenge you to give back and maybe one day speak at PASS Summit and who knows become a Microsoft MVP!

– Jamey

Register before Sept 22 for #PASSSummit & be entered into a daily drawing for PASS merchandise & the grand prize drawing, PASS Summit 2019! Save $200 with Code: PASSITON

https://www.pass.org/summit/2018/RegisterNow.aspx

Row Level Security in Power BI using Parent/Child Hierarchies

 

Row Level Security in Power BI using Parent/Child Hierarchies

Jamey Johnston (@STATCowboy)

 

Overview

I wrote a blog post on using Parent/Child Hierarchies to enforce Row Level Security (RLS) in SQL Server 2016+. The demo in that article used a parent/child hierarchy for an organization combined with a flattened asset hierarchy to enforce strict row level security on a fictional Oil & Gas dataset for a fictional Oil & Gas organization. I recommend reading the “Overview of the Demo” and the “Asset and Organization Hierarchies and RLS” sections in that blog post before proceeding with this blog post. It will give you a good understanding of what hierarchies entail and how they are used in this demo/blog post.

We will only focus on the Organizational Hierarchy and how they can be used to enforce RLS in Power BI in this blog post. Think about creating a HR report and have it automatically filter the person logged into the report and all direct reports under that person. Later we will bring in the Asset Hierarchy in a follow up blog post soon and combine the organizational and asset hierarchies to enforce a more complex RLS scheme as used in the MSSQL demo linked above.

GitHub

of the demo code and Power BI file.

Scripts and Explanation

There are 2 scripts to run to install the entire demo (and you just run them in the order they are number, 1 – 2). They are located in the oilgaspbidemo-SQL-Code folder in the GitHub. If you have used the ones for the RLS demo in SQL Server they are similar but create a different database (oilgaspbidemo) and use just a subset of the tables:

1 – Oil&Gas PBI Demo – BuildTables.sql

The script will create the database, oilgasrlsdemo2016, and the tables needed for the demo

The Tables are as such:

  • ASSET_HIERARCHY – Table contains the asset hierarchy and is used to build the demo data for the asset hierarchy in the Well_Master table.
  • DATES – Generic table to hold a date dimension for use later in an Excel Power Pivot model.
  • SEC_ASSET_MAP – Table contains the entries mapping the organization units to the asset hierarchy levels for access to the data. The table would be managed by the security team for the application. Users and their subordinates are denied access to data via an entry of ou, ‘NONE’, ‘NONE’ or granted all access via an entry of ou, ‘ALL’, ‘ALL’.
  • SEC_ORG_USER_BASE – Table contains the employees including the employee to manager parent/child relationship to build an organization hierarchy and the organizational unit id for mapping to asset hierarchy levels for security. This table would likely be fed from an HR system. Also, will demonstrate Always Encrypted in this table.
  • SEC_USER_EXCEPTIONS – Table contains entries to allow for overrides of the organization hierarchy based model. Any employee entered here will use permission defined in this table instead of what is inherited from the organizational hierarchy.
  • WELL_DAILY_PROD – Contains COMPLETELY made-up and randomly generated daily well production data for Oil, Gas and NGL. Primary key is WELL_ID and RLS is achieved by using the asset hierarchy in the WELL_MASTER table to filter the WELL_IDs. This is a Clustered ColumnStore Indexed table.
  • WELL_MASTER – Contains COMPLETELY made-up and randomly generated master well data including the made up asset hierarchy. This is the main business table used for RLS for ALL well tables.
  • WELL_REASON_CODE – Contains COMPLETELY made-up and randomly generated daily well downtime data for Oil, Gas and NGL. Primary key is WELL_ID and RLS is achieved by using the asset hierarchy in the WELL_MASTER table to filter the WELL_IDs.

 

2 – Oil&Gas PBI Demo – LoadTables.sql

 

This script is used to load or generate the demo data including user and security tables and hierarchy and well data. There are two parameters close to the top that can be used to specify the amount of data to load – @wellcount and @drillyearstart. @wellcount specifies the number of wells to generate and @drillyearstart specifies the first possible year to use for a well. The start date for a well will be randomly selected between @drillyearstart and the current year the script is run.

There is also a Power BI file in the GitHub, Oil and Gas Power BI RLS Demo.pbix, to see the setup of the roles, DAX, tables and demo the hierarchies.

 

RLS in Power BI

So how do we accomplish Row Level Security in Power BI using a parent/child organizational hierarchy? First, we need to know the basics of RLS in Power BI. Let’s start with capturing the username of the currently connected user.

USERNAME() and USERPRINCIPALNAME() in DAX

The two DAX functions, USERNAME() and USERPRINCIPALNAME(), are the key to RLS based security and organizational hierarchies. It is the way we determine who is logged into the PowerBI.com Service and thus use that information (login name) to apply row level security. USERNAME() function returns the domain name and username for the currently connected user as DOMAIN\username where as the USERPRINCIPALNAME() function returns as username@domain.com.

Good article from Adam Saxton, Guy in a Cube GH|B|T|YT, on USERNAME() and RLS – https://powerbi.microsoft.com/en-us/blog/using-username-in-dax-with-row-level-security/. Might be worth a read before continuing with this article

 

RLS Roles

Next, we need to understand Roles in Power BI with RLS. Take a look at this article from Adam Saxton – https://docs.microsoft.com/en-us/power-bi/service-admin-rls – and then head back to this article …

Now that we understand RLS basics in Power BI let see how to implement RLS in Power BI using a Parent/Child relationship in an organization hierarchy.

 

Organizational Hierarchy

So, look at the following Organizational Hierarchy in the demo code described above. Let’s look at Joachim Hapitash in the organizational hierarchy. Under him are Roger Wagner and under Roger are Tracey and Sean. Now let’s say when Roger Wagner logs into the PowerBI HR Report we want to use the hierarchy to only show Roger his record and his employees under him in the hierarchy. How do you accomplish this? Keep reading and I will show you!

 

 

Parent/Child Hierarchy in SQL Server

Now let’s examine the data for the SEC_ORG_USER_BASE table created in the demo. The table uses a unary relationship to relate the employee to his/her manager in the table. Look below and you will see that the CEO, Dave Campos, has NULL for MGRID. That is because he has no manager. But if you look at Joachim Hapitash you will see that he has 1001 for MGRID as Dave Campos, EMPLID = 1001, is his manager.

 

 

Now let’s use a recursive CTE to see how to relate the employee and managers together. If you run the following query against the SEC_ORG_USER_BASE table, you will get Roger and all the employees under Roger in the hierarchy.

oilgaspbidemo-SQL-Code\userhierarchyquery.sql

WITH DirectReports

(MGRID, EMPLID, NAME, MGR, ORG_UNIT_ID, ORG_UNIT_NAME, Level, ORG_LINEAGE, ORG_NAME_LINEAGE, NAME_LINEAGE)

AS

(

SELECT e.MGRID, e.EMPLID, e.NAME, cast(null as varchar(50)) as MGR,

e.ORG_UNIT_ID, e.ORG_UNIT_NAME,

0 AS Level,

     cast(e.ORG_UNIT_ID as varchar(max)) as ORG_LINEAGE,

     cast(e.ORG_UNIT_NAME as varchar(max)) as ORG_NAME_LINEAGE,

     cast(e.NAME as varchar(max)) as NAME_LINEAGE

FROM dbo.SEC_ORG_USER_BASE AS e

WHERE MGRID IS NULL

UNION ALL

SELECT e.MGRID, e.EMPLID, e.NAME, d.NAME, e.ORG_UNIT_ID, e.ORG_UNIT_NAME,

Level + 1,

        case when e.org_unit_id <> d.org_unit_id then cast(d.ORG_LINEAGE + ‘|’ +

cast(e.ORG_UNIT_ID as varchar(max)) as varchar(max)) else cast(d.ORG_LINEAGE as varchar(max)) end,

        case when e.ORG_UNIT_NAME <> d.ORG_UNIT_NAME then cast(d.ORG_NAME_LINEAGE + ‘|’ +

e.ORG_UNIT_NAME as varchar(max)) else cast(d.ORG_NAME_LINEAGE as varchar(max)) end,

        case when e.NAME <> d.NAME then cast(d.NAME + ‘|’ + e.NAME as varchar(max)) else cast(d.NAME as varchar(max)) end

FROM dbo.SEC_ORG_USER_BASE AS e

INNER JOIN DirectReports AS d

ON e.MGRID = d.EMPLID

)

SELECT MGRID, EMPLID, NAME, MGR, ORG_UNIT_ID, ORG_UNIT_NAME, Level, ORG_LINEAGE, ORG_NAME_LINEAGE, NAME_LINEAGE

FROM DirectReports

WHERE NAME_LINEAGE like ‘Roger Wagner%’;

GO

The results are as such:

 

Power BI, RLS and Organizational Hierarchies

Next, we add the SEC_ORG_USER_BASE table into Power BI and add some calculated columns (Path, Level 1, Level 2, Level 3, Level 4, Level 5, Level 6, and HierarchyLevel) and one measure (MaxNumLevels). In this demo we will only look at the SEC_ORG_USER_BASE table. In the next blog article, we will look at all the tables in the demo!

 

 

Below is the DAX for the calculated columns and measure:

  • Path = PATH(SEC_ORG_USER_BASE[EMPLID],SEC_ORG_USER_BASE[MGRID])
  • Level 1 = PATHITEM(SEC_ORG_USER_BASE[Path],1)
  • Level 2 = PATHITEM(SEC_ORG_USER_BASE[Path],2)
  • Level 3 = PATHITEM(SEC_ORG_USER_BASE[Path],3)
  • Level 4 = PATHITEM(SEC_ORG_USER_BASE[Path],4)
  • Level 5 = PATHITEM(SEC_ORG_USER_BASE[Path],5)
  • Level 6 = PATHITEM(SEC_ORG_USER_BASE[Path],6)
  • Level 1 Name = LOOKUPVALUE(SEC_ORG_USER_BASE[NAME], SEC_ORG_USER_BASE[EMPLID], VALUE(PATHITEM(SEC_ORG_USER_BASE[Path],1)))
  • Level 2 Name = LOOKUPVALUE(SEC_ORG_USER_BASE[NAME], SEC_ORG_USER_BASE[EMPLID], VALUE(PATHITEM(SEC_ORG_USER_BASE[Path],2)))
  • Level 3 Name = LOOKUPVALUE(SEC_ORG_USER_BASE[NAME], SEC_ORG_USER_BASE[EMPLID], VALUE(PATHITEM(SEC_ORG_USER_BASE[Path],3)))
  • Level 4 Name = LOOKUPVALUE(SEC_ORG_USER_BASE[NAME], SEC_ORG_USER_BASE[EMPLID], VALUE(PATHITEM(SEC_ORG_USER_BASE[Path],4)))
  • Level 5 Name = LOOKUPVALUE(SEC_ORG_USER_BASE[NAME], SEC_ORG_USER_BASE[EMPLID], VALUE(PATHITEM(SEC_ORG_USER_BASE[Path],5)))
  • Level 6 Name = LOOKUPVALUE(SEC_ORG_USER_BASE[NAME], SEC_ORG_USER_BASE[EMPLID], VALUE(PATHITEM(SEC_ORG_USER_BASE[Path],6)))
  • HierarchyLevel = PATHLENGTH(PATH(SEC_ORG_USER_BASE[EMPLID], SEC_ORG_USER_BASE[MGRID]))
  • MaxNumLevels = MAX(SEC_ORG_USER_BASE[HierarchyLevel])

The Path column give a pipe, “|” separated list of the hierarchy at that node.
The Level X columns give the EMPLID of the LEVEL X employee at that level in the hierarchy. You will use these columns to build a hierarchy for the report.
The Level X Name columns give the NAME of the LEVEL X employee at that level in the hierarchy. You will use these columns to build a hierarchy for the report.
The HierarchyLevel column gives the depth of the organizational level of the employee.
The MaxNumLevels measure calculates the max levels in the hierarchy. Used to determine the number of Level X columns to create. Don’t filter any employees and it will show “6” in this demo if you add a table with just the measure.

These DAX statements use the Parent and Child DAX functions described here – https://msdn.microsoft.com/en-us/library/mt150102.aspx.

 

Now let’s setup a hierarchy in the Power BI model so we navigate the organizational hierarchy in visuals. This is accomplished by right-clicking on “Level 1″in the Fields section on the far right and choosing “New hierarchy”.

 

This will create a new hierarchy called “Level 1 Hierarchy”. Let’s rename it to “Org Hierarchy” by right-clicking (or clicking the ellipses – three dots – to the right of the name) and choosing “Rename”.


 

To add the remaining levels to the Hierarchy, starting with Level 2 and ending with Level 6 right-click and select “Add to Hierarchy -> Org Hierarchy” for each level until they are all in the hierarchy.

 


 

We can hide “Level X” columns now that we have a hierarchy defined. Right-click on each “Level X” field and choose “Hide”.

 

We will repeat the above for the “Level X Name” columns and call the hierarchy “Org Hierarchy Names”.

 

Now let’s setup the Row Level Security. To do this we will setup a role called “Org Security”. Under the “Modeling” menu choose “Manage Roles”.

 

When the Manage roles dialog pops up create a new role called “Org Security” for the table “SEC_ORG_USER_BASE” and use the following DAX expression as the Table filter and “Save”.

PATHCONTAINS(PATH(SEC_ORG_USER_BASE[EMPLID],SEC_ORG_USER_BASE[MGRID]),
LOOKUPVALUE(SEC_ORG_USER_BASE[EMPLID], [USERID],
PATHITEM(SUBSTITUTE(USERPRINCIPALNAME(), “@”, “|”), 1)))

The DAX expression is doing the following:

  • Uses the PATHCONTAINS DAX function to search the Parent/Child hierarchy (by using the PATH DAX function) to find the employee record and all employee records under that employee.
  • Uses the LOOKUPVALUE DAX function to convert the logged in USERID to the EMPLID.
  • Uses the PATHITEM(SUBSTITUTE(USERPRINCIPALNAME(), “@”, “|”), 1)) DAX expression to convert the USERPRINCIPALNAME() (e.g. mcox@poweroilandgas.com) to the USERID (e.g. mcox) and feed the LOOKUPVALUE DAX function to get the EMPLID to feed the PATHCONTAINS DAX function.

 

Now let’s test the security! There is a “Matrix” visual added to the PBIX file in the downloads. It has the “Org Hierarchy Names” added to it as well as a column with counts by “EMPLID”. This will show us the employee counts at each level of the hierarchy in the organization.

 

Now let’s use the “View as Roles” to test! Click the “View as Roles” button under the Modeling menu.

 

 

Then enter mcox@poweroil-n-gas.com for “Other user” and check the “Other user” box. Also, check the “Org Security” box to enable the “Org Security” role.

 

 

Now check the Matrix visual! It should be filtered to only “mcox” and employees under “mcox”. Notice how we only see 119 employees in the dataset now and NOT 157!

Now we need to add a HR Role to allow HR personnel the ability to see all employees and not limited to just their organizational units. This is easy to do. Just add a role called “HR Security” and leave the filter blank.

Now when you publish to the PowerBI.com service you would assign the HR personnel to the “HR Security Role” and all other personnel to the “Org Security” role.

Take a look again at the article from Adam Saxton – https://docs.microsoft.com/en-us/power-bi/service-admin-rls#manage-security-on-your-model – to see how to assign the users/groups to the Roles in the Power BI service and to test it in the Power BI service.

Download Code!

Loop Through Dates in T-SQL … One Day at a Time!

 

T-SQL Code to Loop One Day at a Time

Jamey Johnston (@STATCowboy)

 

Hidden in my SQL Server 2016 Security Demo blog post is a neat T-SQL trick to loop through a date range day by day (check out the “2 – Oil&Gas RLS Demo – LoadTables.sql” script when you download the code)! But to give you a simpler example the gist of the code in the script is this:

 

DECLARE @StartDate DATE = ‘2016-10-01’;
DECLARE @EndDate DATE = ‘2016-10-31’;

WHILE (@StartDate <= @EndDate)

BEGIN

print @StartDate;

— Do Something like call a proc with the variable @StartDate

set @StartDate = DATEADD(day, 1, @StartDate);

END;

Download code!

As you can see this is simple code and works like a charm! Hope you enjoy and hit me up on Twitter (@STATCowboy) if you have questions or improvements to the T-SQL Code!

My Journey to Data Science – Part 3 of 3

My Career Journey Over the Last 20+ Years

October 28, 2016

 

Jamey Johnston (@STATCowboy)

 

This is Part 3 (the final post!) of a 3-part Blog series on my Career Journey the Last 20+ years into my current role as a Data Scientist. Part 1 can be found here and Part 2 can be found here!

BI

 

BI is fun! I loved it and soon was not missing the late-night pages of down databases or late nights doing database upgrades. (Although, one night I upgraded 11 Oracle production databases to Oracle 11g at the same time which I must admit was fun!) I loved the insights I gained from building BI solutions using MS products like SSAS, SSRS, MSSQL and SharePoint. I also loved the new-found freedom of exploring data and making it all work together in tools like Spotfire and Power BI! I was building large analytic datasets with billions of rows and special analytic calculations. I was having fun but I was missing something to truly fill the gap as a data scientist, statistics!

 

TAMU MS of Analytics

 

I was driving late one night into the office on I-45 in Houston which was not my normal route and I see this billboard for Texas A&M Masters of Science in Analytics! I had been looking for a program to learn statistics but I wanted it local (my ADHD doesn’t do well with online classes!). This was March of 2014. I was accepted into the program in June 2014 and started classes the end of August 2014! To say the least, the next two years my life was crazy! I had started to learn R the last year or so and in the program, I learned about SAS and SAS JMP and a ton of statistics from regression, time series, spatial statistics, multivariate analysis to machine learning! I was having a blast learning statistics and all my friends and family thought I was plum mad to start grad school in my 40s and in all things, Statistics!

I can’t emphasize the importance of a good education in statistics and how valuable the MS in Analytics from TAMU was in that education. I graduated in May 2016 with my Masters and could not be happier with my experience at TAMU. I am proud to be a member of the Fightin’ Texas Aggie Class of 2016! Yes, I wore Mickey Ears with my Twitter handle (@STATCowboy) on the back at graduation!

 


 

Data Scientist

 

Armed with my learning of R, SAS and more importantly, Statistics, I am fully ready for whatever data is thrown at me. I get to build some really cool statistical and machine learning models these days and I can’t believe at 42 years of age I have a new career I basically started at 40! I guess if you are going to have a midlife crisis a career change to The Sexiest Job of the 21st Century according to the Harvard Business Review is a good crisis! Who knows what career I will have at 50?!

I’ll finish with my definition of a Data Scientist – someone who loves finding insight from data, big or small! In some ways, we are all data scientist and I would encourage anyone who would like to take their data knowledge to the next level to learn Statistics. Statistics is the first leg of a three-legged stool that makes up data science. The second leg is a data guru! Someone who can store, move and manage data and present it in an easy to use fashion that can be consumed quickly. The final and third leg of the stool is a Subject Matter Expert (SME) who knows the business and the data around the business. One person is never an expert in all of these but a Data Scientist is typically an expert in one and very proficient in the others. However, I find the best Data Scientist is really a team of folks that love data and bring expertise in one of the three areas – statistics, data or subject matter expert – to the table to build models to make business better!

To get you started in your career as a Data Scientist check out my presentation from SQL Summit 2016 here. Be sure to check out the resources section of the PPT to see some good locations to learn things like R! Also, feel free to reach out to me on Twitter if you have questions along your journey. I am happy to help!

I hope you enjoyed reading about my journey. I know have enjoyed living it!

Code Like a Pirate … Intro to R and Data Science Tools in MS

So, you want to code like a Pirate! – R … R … R!

SQL Summit 2016 Version!
October 28, 2016

 

Jamey Johnston (@STATCowboy)

 

Thanks for all who attended my session at SQL Pass Summit 2016. The link to the scripts, documents and presentation materials from the session is below. If you have any questions or comments, feel free to hit me up on Twitter (@STATCowboy). I really appreciate all of you who attended and sorry if I got to statistically on you!

 

Download from OneDrive!

 

My Journey to Data Science – Part 2 of 3

My Career Journey Over the Last 20+ Years

October 26, 2016

 

Jamey Johnston (@STATCowboy)

 

 

This is Part 2 of a 3-part Blog series on my Career Journey the Last 20+ years into my current role as a Data Scientist. Part 1 can be found here!

DBA

 

I had been working for LDEQ for three years now and was one year removed from college when a good friend called. He was working as a consultant for Oracle Corporation in Houston. Oracle had signed a big contract with a client in Houston and needed DBAs on the project. He told me to send my resume so I could join him which I did! Two weeks later I was a Senior Consultant with Oracle Corporation working and living in Houston. It was a very exciting time of my life those three years. I had one of the best DBA mentors you could have, Clement C., teaching me about Oracle and technology and was getting to travel to some cool places in the US. Most importantly, though, I found the love of my life soon after moving to Houston, my wife, Melanie!

Oracle was a great 3 years for me and I learned so much that truly set the stage for the next 10 years or so of my life as a DBA. I enjoyed Oracle but after 3 years of travelling it was time to stop traveling and get a local job in Houston so I could spend more time with my wife and dogs! My wife will tell you I only stopped traveling for the dogs (not true though)!

This little-known company at the time called Enron (Ha Ha!) was hiring and I got hired on there! It was June of 2001 and Enron couldn’t be flying any higher (at least in their make-believe world!). We were one of the 10 largest companies (at least on invisible paper) in the world and the talent coming to work there was second to none. Over the next year and half I would learn a great deal about business as well as management as I was soon promoted to manage one of the core DBA teams at Enron. Enron went bankrupt and I moved to work for UBS when they bought the North American trading division of Enron.

After a brief stent at UBS they decided to get out of Houston and I went to work for a small consulting firm, Methods Technology Solutions, out of Baton Rouge. I did some travelling again and really had a lot of freedom to learn some new things and work on a variety of database projects from very small projects to bigger ones. I worked for Methods for 4 years, the last year and half consulting at a large independent Oil and Gas company. The Oil and Gas company offered me a full-time employment job and I have been there now for over 12 years.

The first 7 years at the O&G company I was a DBA team member working on admin, upgrades and a lot of tuning. It was fun! I had some time to write a little code, too! I created an end-user web dashboard for the app owners to see what was happening in the DB with their systems in C# ASP.NET. I created a multi-threaded application in C# to rapidly put together archived emails in to a bundle and create audit trails in XML, CSV, TXT and DB tables. However, I mostly did the typical enterprise DBA tasks.

 

Tuning SQL to Enhancing Business Processes and Relationships

 

I really enjoyed tuning database systems. I was a master of 10046 traces in Oracle and SQL Profiler and Extended Events in SQL Server and was able to tune a great deal of our applications. The funny thing about tuning databases when you are a DBA is you tend to look for that magic parameter, trace flag, bad SQL, etc. and often forget about the application and business processes behind what you are trying to optimize. More importantly, you often forget about the business people using those applications and how that data is used to add business value.

Anyone who knows me knows I am not shy! I love people and I love engaging with people. This led me to have a different approach to tuning. When I would get a call that something was slow I would find the business person who was having an issue, schedule an hour of their time in their office, have them reproduce the issue in the system while I watch both what they were doing in the application and also what was happening in the database. It was funny to see their face when the “DBA” would show up in their office. It was like an endangered species had left their native habitat and ended up in their office. Many would even tell me I have never seen a DBA before! This approach created a special relationship that allowed me to start understanding the business processes behind my company and fostering relationships to do more than just “tune database systems”.

I was put on a special project to implement a new business system and was actually placed for a year within the business team to assist with implementing the application. The application is a critical one for O&G companies and so I was dedicated to just working on that project and application. For the first time in almost 15 years I wasn’t a DBA! I wasn’t on-call (I didn’t know what to do with my new-found freedom). I didn’t have to do database upgrades. It was so weird to actually be a “customer” of the DBA team! It certainly gave me insight into how it felt to be on the other side of the table. I absolutely loved the project. I spent so much time understanding all the various data pieces and how they would flow into the new system (we had several different DBs that would need to integrate into this new system) and more importantly how that data truly worked for this particular part of the business. I really loved data! I realized then that I didn’t really want to be a DBA full-time (still like the part-time idea just to keep current but not very practical) but I wanted to work with data. I wanted to build datasets and analyze them. I wanted to find ways to better connect them together. I wasn’t quite on the statistics track yet but definitely the data and business side of Data Science.

 

DBA to BI

 

As I began to understand the applications and business processes behind using the databases I supported, I also became aware of how so many of the systems were connected to one another either by ETL, database links, linked servers or virtually through various business processes. The connections of all the systems and more importantly, for me, the data led me on my next journey into BI.

I had become exposed to a tool called Spotfire while I was on the special project and learned how to use the tool and how people at work were using the tool. So, I began to work on connecting Spotfire directly and securely to the data sources and soon I was moved onto the BI team working on BI projects in Microsoft and Spotfire. I was a BI guy now!

 

Stayed tuned for Part 3 tomorrow – BI to Data Scientist!

 

My Journey to Data Science – Part 1 of 3

My Career Journey Over the Last 20+ Years

October 25, 2016

 

Jamey Johnston (@STATCowboy)

 

So I am at the EMP in Seattle attending the SQL Summit 2015 Appreciation Event and I am introduced to Denise McInerny (t) by my good friend Wendy Pastrick (t | b). Of course, the common pleasantries are exchanged, “My name is …”, “I work at XYZ company …” and “I do ABC job …”, which in my case is “My name is Jamey Johnston, I work at an O&G company and I am a Data Scientist”. Denise’s response was not quite what most people’s response is when I tell them I am a Data Scientist. Usually I get a general look of trepidation or the occasional, “Oh, you are an unicorn!” (true story, several times), but in Denise’s case she says “You should write a blog about your journey to become a Data Scientist” (or something along those lines). I thought that might be a fun blog to write and said “Sure!”. So here is the story of my journey to becoming a Data Scientist.

 

10 PRINT “My Name is Jamey”

 

So, I am an eight-year-old living in the suburbs of New Orleans and I want a computer, a Commodore Vic-20, and I get it! I plug it in and connect it to the color console TV in my living room and turn it on. Within 10 minutes I have written my first program:

 

10 PRINT “My Name is Jamey”

20 GOTO 10

30 RUN
 

The next thing I see is the greatest thing I have ever seen – “My Name is Jamey” scrolling across my parent’s TV screen as fast as I can read it! I sit there and watch it go across the screen over and over and over again finally asking myself – “How do I stop it!?”. I scan through the manual that came with the Vic-20 looking for the correct key combination to stop it! No Luck and no Internet in 1982 so I do the only thing I can think of to stop it – “Unplug!”. I loved that computer and it was what fueled my desire to learn more about computers!

 

GIS and College

 

Fast forward to college and I am a 19-year-old who just finished his first year of college as a Psych major and I get a job with the Civil Engineering department at school working with GIS and Remote Sensing technologies. My father had started in GIS and Remote Sensing back in the 70’s for Wetlands research working for the Department of Interior so I was familiar with GIS and Remote Sensing and I was excited to work with big workstations, lots of data and getting to work in the same field as my dad! So these big workstations were SGI and DG boxes running from 25Mhz to 100Mhz processors! Your phone is probably 100 times faster!

Two years later I finish my junior year and I am still working on GIS and Remote Sensing projects, one of which was to run clusters against tiles of Thematic Mapper satellite data of the entire state of Louisiana and then use those clusters to classify them into land use and land cover categories (i.e. is this cluster water or agriculture or coniferous forest, etc.). I was working with ESRI and Erdas Imagine software and learning UNIX which was really fun and beneficial to my career.

I loved being a GIS and Remote Sensing technologist but I was still a Psych major! I wanted to change majors but the problem was my University only had ONE geography class! So there was no way I could stay at the University of Southwestern Louisiana and get a GIS degree. I transferred to LSU to start in the Fall of 1995 which had a great GIS program through the Department of Geography and Anthropology to get a BS in Geography with an emphasis in GIS and Remote Sensing. I also went to work for the GIS Center at the Louisiana Department of Environmental Quality (LDEQ) continuing my career in GIS and Remote Sensing. It took me two more years to finish school as I had to take 36 hours of Geography to graduate. My first semester was 12 hours of Geography with a Geography 1001 class up to a Graduate level class of Geography 4998 and a music class I believe. It was a fun semester to say the least.

I graduated in Spring of 1997 from LSU and continued working for the GIS Center in the field of GIS and Remote Sensing. I also started working with another cool technology, Relational Database Management System (RDBMS). ESRI had created the first version of ArcSDE which was a way to store your spatial data in a database. In my case it was Oracle. We had Oracle v6 and v7 databases at LDEQ and I was starting to learn them to support ArcSDE along with some other technologies that required RDBMS.

 

Stayed tuned for Part 2 tomorrow – DBA to BI!

SQL Server 2016 Security Demo

RLS, Dynamic Data Masking and Always Encrypted

Jamey Johnston (@STATCowboy)

NOTE: Demo requires SQL Server 2016 CTP 2.4+ or Azure SQL Database v12.

 

Overview of the Demo

 

Security is a vital part of a DBA, Developer and Data Architects job these days. The number of attacks on databases of major retailors from criminals looking for information like email addresses, ID numbers, birthdays, etc. on individuals to be exploited and sold is ever increasing. The demo contained in the link at the bottom of this page is meant to showcase three of the new features Microsoft is bringing to the Azure SQL Database v12 and SQL Server 2016 to help combat this cyber-attacks:

Row Level Security (RLS)

Dynamic Data Masking

Always Encrypted

 

This first part of the demo is meant to show how an organizational hierarchy and asset hierarchy can be leveraged together to provide Row Level Security on tables in a SQL Server database using the new predicate based RLS feature in SQL Server 2016 and Azure v12. This demo is completely made up oil well production data for a fictitious organization with 153 fictional employees and come as in with ABSOLUTELY NO WARRANTY or GUARANTEE!

Also, the demo will show how to use RLS with the HieararchyID Datatype, the new Dynamic Data Masking and Always Encrypted Security Features.

This post is about the demo which will show you an advanced implementation template for RLS as well as some of the other security features. Please use the links above to the official Microsoft documentation to learn about each feature first before trying out the demo as it will help you understand the demo better.

If you attended the session at SQL Summit 2015, Overview of Security Investments in SQL Server 2016 and Azure SQL Database [DBA-327-M], which I co-presented with the Microsoft SQL Server Security team this is the demo we used at the end.

 

Asset and Organization Hierarchies and RLS

 

The basis of the security is nodes in the organizational hierarchy are granted access to levels in the asset hierarchy and those grants filter down to the members below in the hierarchy. This allows for inheritance of permissions via the Organization and Asset Hierarchy (i.e. Child nodes can inherit from Parent Nodes).

Functionality is built-in to the model to override the security model for a lower member including denying access altogether (‘NONE’ in the security map) or granting access to all data (‘ALL’ in the security map) via exceptions (SEC_USER_EXCEPTIONS). A base user table exists (SEC_ORG_USER_BASE) that has the relationship of employee to manager as well as the organizational unit id for the employee. This table would likely be fed from an HR system in a production scenario. A master table for the wells (WELL_MASTER) contains the asset hierarchy to filter security to the well data. Read the notes about the tables below for more details.

Below shows what an Asset and Organizational Hierarchy would look like (also, this is what is in the demo) and finally a walk down one branch of the organizational hierarchy to see how to apply RLS against the Asset Hierarchy.

 

Asset Hierarchy (snippet)

 

 

Organizational Hierarchy (snippet)

 

Asset and Organization Hierarchy and RLS (CEO to one USER)

 

 

 

Scripts and Explanations

 

There are 6 scripts to run to see the entire demo (and you just run them in the order they are number, 1 – 6):

1 – Oil&Gas RLS Demo – BuildTables.sql

The script will create the database, oilgasrlsdemo2016, and the tables needed for the demo

The Tables are as such:

  • ASSET_HIERARCHY – Table contains the asset hierarchy and is used to build the demo data for the asset hierarchy in the Well_Master table.
  • DATES – Generic table to hold a date dimension for use later in an Excel Power Pivot model.
  • SEC_ASSET_MAP – Table contains the entries mapping the organization units to the asset hierarchy levels for access to the data. The table would be managed by the security team for the application. Users and their subordinates are denied access to data via an entry of ou, ‘NONE’, ‘NONE’ or granted all access via an entry of ou, ‘ALL’, ‘ALL’.
  • SEC_ORG_USER_BASE – Table contains the employees including the employee to manager parent/child relationship to build an organization hierarchy and the organizational unit id for mapping to asset hierarchy levels for security. This table would likely be fed from an HR system. Also, will demonstrate Always Encrypted in this table.
  • SEC_ORG_USER_BASE_HID – Same as SEC_ORG_USER_BASE but includes HierarchyID column to demonstrate RLS with HierarchyID data types and to demonstrate Data Masking.
  • SEC_ORG_USER_BASE_MAP – Table contains the employee data including an entry (SECURITY_CLEARANCE) to denote the security clearance the employee is granted by walking down the organization hierarchy and finding the lowest level above including themselves that has been granted access to data. The SEC_ASSET_MAP table is used along with the SEC_ORG_USER_BASE table to generate the data in this table. The procedure REFRESH_SECURITY_TABLES is called to refresh the data in this table.
  • SEC_USER_EXCEPTIONS – Table contains entries to allow for overrides of the organization hierarchy based model. Any employee entered here will use permission defined in this table instead of what is inherited from the organizational hierarchy.
  • SEC_USER_MAP – This table is generated by the REFRESH_SECURITY_TABLES procedure and generates the asset level access for each user in the database based upon the values in the security tables SEC_ORG_USER_BASE_MAP, SEC_ASSET_MAP and SEC_USER_EXCEPTIONS. This is the ONLY table used by the functions for the Security Policy. The other SEC_ tables are used to generate this table for RLS.
  • WELL_DAILY_PROD – Contains COMPLETELY made-up and randomly generated daily well production data for Oil, Gas and NGL. Primary key is WELL_ID and RLS is achieved by using the asset hierarchy in the WELL_MASTER table to filter the WELL_IDs. This is a Clustered ColumnStore Indexed table.
  • WELL_MASTER – Contains COMPLETELY made-up and randomly generated master well data including the made up asset hierarchy. This is the main business table used for RLS for ALL well tables.
  • WELL_REASON_CODE – Contains COMPLETELY made-up and randomly generated daily well downtime data for Oil, Gas and NGL. Primary key is WELL_ID and RLS is achieved by using the asset hierarchy in the WELL_MASTER table to filter the WELL_IDs.

 

2 – Oil&Gas RLS Demo – LoadTables.sql

This script is used to load or generate the demo data including user and security tables and hierarchy and well data. There are two parameters close to the top that can be used to specify the amount of data to load – @wellcount and @drillyearstart. @wellcount specifies the number of wells to generate and @drillyearstart specifies the first possible year to use for a well. The start date for a well will be randomly selected between @drillyearstart and the current year the script is run.

 

3 – Oil&Gas RLS Demo – Security Setup.sql

This script sets up the RLS functions, policies and the procedure REFRESH_SECURITY_TABLES. The procedure REFRESH_SECURITY_TABLES is used to generate the RLS security mappings in the SEC_ tables as described in the sections above. In a production environment this procedure would need to be run every time the hierarchies were updated or new users were added to the database.

This script also will build users in the database based on the userids generated in SEC_ORG_USER_BASE table for testing RLS.

 

4 – Oil&Gas RLS Demo – Test RLS Security.sql

This script contains sample queries to test RLS at different levels in the organizational hierarchy and asset hierarchy. It also will add another entry in the security table granting a user access to another level in the hierarchy along with their current level and run the procedure to update the security to include this new entry.

 

5 – Oil&Gas RLS Demo – RLS Security with HierarchyID and Data Masking.sql

 

This script makes a copy of the SEC_ORG_USER_BASE table called SEC_ORG_USER_BASE_HID that contains a version of the Organizational Hierarchy using the HierarchyID Datatype. It shows how to populate a HierarchyID Datatype from a Parent/Child Hierarchy and will implement RLS security using the HierarchyID Datatype as well as add some new columns, EMAIL_ADDRESS and DOB (Date of Birth) to the table to highlight Dynamic Data Masking in SQL Server 2016.

 

Info about HierarchyID – https://msdn.microsoft.com/en-us/library/bb677290.aspx

 

6 – Oil&Gas RLS Demo – Always Encrypted.sql

This script will setup the Master and Column keys for Always Encrypted as well as add a new encrypted column, SSN, to the SEC_ORG_USER_BASE table to test Always On Encryption. NOTE: If your database you are using for this demo is on a separate server than where you will run the application (see below) included in the download to test all the features you will need to run this script from SSMS on the machine you run the application and it will need .NET 4.6.

 

ERD of O&G RLS Demo DB

 

 

 

Demo Application

 

There is an application built in Visual Studio 2015 (.NET 4.6) that you can use to test out the demo once you run all 6 scripts (Source code is included as well!). Just run the EXE in the download on a machine with .NET 4.6 installed and the instructions are at the bottom of the application (see screenshot below).

Use the WELL_MASTER, WELL_DAILY_PROD and/or WELL_DOWNTIME to test the Parent/Child RLS Demo (Scripts 1 – 4). Use the SEC_ORG_USER_BASE_HID table to test the RLS with HierarchyID and Dynamic Data Masking (Script 5). Finally, use the SEC_ORG_USER_BASE table to test the Always Encrypted. You can EDIT the fields by clicking inside of them so for the Always Encrypted Demo you would click in the cell for SSN for an employee and enter a valid SSN and click “Commit” (see screenshot below).

 

 

 

Demo Instructions, Download and Contents

 

Below is the contents of the demo download and descriptions about each file/folder. To run through the demo, do the following:

  1. Download and extract the files to a folder
  2. Run all the scripts in the MSSQL 2016 folder in order, 1-6, on SQL Server 2016 CTP 2.4+ or Azure SQL Database v12.
  3. Then use the OGSecurityDemo2016.exe file to run the Demo Application

MSSQL 2016 – Folder with the 6 demo scripts
OSSecurityDemo2016 –
Folder with the source code (VS2015 .NET4.6) for the Demo Application
SimpleRLSExample –
Folder with a simple RLS Demo (start slow)
O&G_Demo_ERD.png –
Graphic of ERD
OGSecurityDemo2016.exe –
Demo Application (you can just run it if you have .NET 4.6)
Oil&Gas SQL Server Security Demo.docx –
Overview of the Demo

 

Link to download – https://onedrive.live.com/redir?resid=4930901E9435C751!135238&authkey=!AF7fpcw5K5waySg&ithint=folder%2czip

 

Hope you enjoy and tweet me @STATCowboy if you have questions, comments or concerns or need help with the demo!

Setup Azure SQL Database using the New Azure Portal

April 7, 2015

Jamey Johnston

Azure SQL Database is a relational database-as-a-service that allows for scaling up to thousands databases. It is a self-managed service that allows for near-zero maintenance, in other words, Microsoft handles all the backups, patching and redundancy. Azure SQL Database offers service tiers that allows for dialing up or down the horsepower as needed which also means the pricing can scale up or down. Pricing of the Service Tiers range from ~ $5/month to the slowest and smallest in size to ~$3,270/month for the fastest and largest.

More Info on Azure SQL Database

In this post we will walk through the steps to create an Azure SQL Database using the new Azure Portal. Please note the new Portal is still in beta so the steps and screenshots may change as Microsoft still rolls out the new portal.

This post is based on the documentation on Azure, Get started with SQL Database by jeffgoll, which shows how to setup an Azure SQL Database using the older Azure Portal.

 

Step 1: Create an Account on Azure

 

This step can be skipped if you already have an account on Azure. If not, go to http://azure.microsoft.com and click on the “Free Trial” in the upper right corner to get started.

 

Step 2: Logon to the New Azure Portal and Provision a Virtual Server and SQL Database

 

  1. Access the new Azure portal at http://portal.azure.com and login with your Azure Account.
  2. Click “New” at the bottom left of the page, then “Data + Storage”, then “SQL Database” to start the SQL Database wizard.

  3. In the “SQL Database” panel enter your desired Azure SQL Database name in the “Name” field (e.g. myfirstazuresqldb) then click “Server / Configure required settings“. In the Server Panel pop-out to the side, click “Create a new server” (I am assuming if you are reading this tutorial you don’t have an existing Azure SQL Server setup!).

     

     

     

  4. In the “New Server” panel pop-out enter an Azure SQL Server name in the “SERVER NAME” field (e.g. myfirstazuresqlserver), a Server Admin Login account name in the “SERVER ADMIN LOGIN” field (e.g. SQLAdmin), a password in the “PASSWORD” field and enter the password again in the “CONFIRM PASSWORD” field. Choose the location where you want the server to reside by clicking in the “LOCATION” area and choosing the Location. Leave the “Yes” chosen in the “CREATE V12 SERVER (LATEST UPDATE) so we can learn some new features in a later blog post. Leave “ALLOW AZURE SERVICES TO ACCESS SERVER” checked. Click “OK” at the bottom of the “New Server” panel to continue.

     

     

  5. Back on the “SQL Database” panel leave “Blank Database” selected in “SELECT SOURCE” option and click on the “PRICING TIER” option and choose “B Basic” in the Pricing Tier panel pop-up and click “Select” at the bottom.

     

     

     

  6. On the “SQL Database” panel you can set the desired Collation but we will leave it the default for now. Click on “Resource Group”, then in the “Resource Group” panel choose “Create a new resource group” and then in the “Create resource group” panel type a resource group name in the “Name” field (e.g. myfirstresourcegroup”).

     

     

  7. Choose the “Subscription” (e.g. Visual Studio Ultimate with MSDN) to pay for the Azure service. Check “Add to Startboard” to have the SQL Database show up on the Portal front page. Then click “Create” to begin provisioning your new Azure SQL Database! 🙂

     

     

  8. The page will go back to the Portal front page and a tile will appear with the title “Creating SQL Database”. Also, in the side bar to the left you will see a notification.

     

     

     

  9. Once the database is provisioned the pages will refresh and show the Azure SQL Database Dashboard page.

     

     

  10. Your database is provisioned!

     

     

Step 3: Add Firewall Rule to Access Azure SQL Database

 

  1. Click on the Azure SQL Server under the “Summary” section.

     

     

  2. The “SQL Server” panel will pop-out to the right, click “Settings” to open up the SQL Server configuration panel to the right.

     

     

  3. In the “Settings” panel click on “Firewall”. In the “Firewall Settings” panel that pops-out enter a “RULE NAME” (e.g. Home), the START IP and END IP which will probably be the same IP address if testing from home. You can use a website like whatismyip.com to get your external IP address. Click “Save” at the top once finished entering the values. (NOTE: The values below are not valid and you should enter the correct ones for your location).

    Your IP may change so check periodically to make sure you have the correct IP addresses in the firewall rules. Also, if trying to connect from work your work firewall may block access to the default port of 1433 which is used for access to the Azure SQL Database so if you are having issues trying to connect from work they may be your issue! J

     

     

  4. A message should appear indicating the firewall rules were successfully updated. Click “Ok”.

     

     

  5. Firewall is configured.

 

Step 4: Setup SQL Server Management Studio to Manage and Access Azure SQL Database

 

  1. Finally to test your new Azure SQL Database download the SQL Server 2014 Management Studio Express and install using the defaults. Choose the 32-bit or 64-bit version depending on your O/S version (probably 64-bit for most).

     

     

     

     

  2. Launch SQL Server 2014 Management Studio from your PC and login to the server you created in Step 2 using the server name, admin account and password and click “Connect”.

     

     

     

     

  3. Congratulations you have created your first Azure SQL Database!