My Journey to Data Science – Part 1 of 3

My Career Journey Over the Last 20+ Years

October 25, 2016

 

Jamey Johnston (@STATCowboy)

 

So I am at the EMP in Seattle attending the SQL Summit 2015 Appreciation Event and I am introduced to Denise McInerny (t) by my good friend Wendy Pastrick (t | b). Of course, the common pleasantries are exchanged, “My name is …”, “I work at XYZ company …” and “I do ABC job …”, which in my case is “My name is Jamey Johnston, I work at an O&G company and I am a Data Scientist”. Denise’s response was not quite what most people’s response is when I tell them I am a Data Scientist. Usually I get a general look of trepidation or the occasional, “Oh, you are an unicorn!” (true story, several times), but in Denise’s case she says “You should write a blog about your journey to become a Data Scientist” (or something along those lines). I thought that might be a fun blog to write and said “Sure!”. So here is the story of my journey to becoming a Data Scientist.

 

10 PRINT “My Name is Jamey”

 

So, I am an eight-year-old living in the suburbs of New Orleans and I want a computer, a Commodore Vic-20, and I get it! I plug it in and connect it to the color console TV in my living room and turn it on. Within 10 minutes I have written my first program:

 

10 PRINT “My Name is Jamey”

20 GOTO 10

30 RUN
 

The next thing I see is the greatest thing I have ever seen – “My Name is Jamey” scrolling across my parent’s TV screen as fast as I can read it! I sit there and watch it go across the screen over and over and over again finally asking myself – “How do I stop it!?”. I scan through the manual that came with the Vic-20 looking for the correct key combination to stop it! No Luck and no Internet in 1982 so I do the only thing I can think of to stop it – “Unplug!”. I loved that computer and it was what fueled my desire to learn more about computers!

 

GIS and College

 

Fast forward to college and I am a 19-year-old who just finished his first year of college as a Psych major and I get a job with the Civil Engineering department at school working with GIS and Remote Sensing technologies. My father had started in GIS and Remote Sensing back in the 70’s for Wetlands research working for the Department of Interior so I was familiar with GIS and Remote Sensing and I was excited to work with big workstations, lots of data and getting to work in the same field as my dad! So these big workstations were SGI and DG boxes running from 25Mhz to 100Mhz processors! Your phone is probably 100 times faster!

Two years later I finish my junior year and I am still working on GIS and Remote Sensing projects, one of which was to run clusters against tiles of Thematic Mapper satellite data of the entire state of Louisiana and then use those clusters to classify them into land use and land cover categories (i.e. is this cluster water or agriculture or coniferous forest, etc.). I was working with ESRI and Erdas Imagine software and learning UNIX which was really fun and beneficial to my career.

I loved being a GIS and Remote Sensing technologist but I was still a Psych major! I wanted to change majors but the problem was my University only had ONE geography class! So there was no way I could stay at the University of Southwestern Louisiana and get a GIS degree. I transferred to LSU to start in the Fall of 1995 which had a great GIS program through the Department of Geography and Anthropology to get a BS in Geography with an emphasis in GIS and Remote Sensing. I also went to work for the GIS Center at the Louisiana Department of Environmental Quality (LDEQ) continuing my career in GIS and Remote Sensing. It took me two more years to finish school as I had to take 36 hours of Geography to graduate. My first semester was 12 hours of Geography with a Geography 1001 class up to a Graduate level class of Geography 4998 and a music class I believe. It was a fun semester to say the least.

I graduated in Spring of 1997 from LSU and continued working for the GIS Center in the field of GIS and Remote Sensing. I also started working with another cool technology, Relational Database Management System (RDBMS). ESRI had created the first version of ArcSDE which was a way to store your spatial data in a database. In my case it was Oracle. We had Oracle v6 and v7 databases at LDEQ and I was starting to learn them to support ArcSDE along with some other technologies that required RDBMS.

 

Stayed tuned for Part 2 tomorrow – DBA to BI!

SQL Server 2016 Security Demo

RLS, Dynamic Data Masking and Always Encrypted

Jamey Johnston (@STATCowboy)

NOTE: Demo requires SQL Server 2016 CTP 2.4+ or Azure SQL Database v12.

 

Overview of the Demo

 

Security is a vital part of a DBA, Developer and Data Architects job these days. The number of attacks on databases of major retailors from criminals looking for information like email addresses, ID numbers, birthdays, etc. on individuals to be exploited and sold is ever increasing. The demo contained in the link at the bottom of this page is meant to showcase three of the new features Microsoft is bringing to the Azure SQL Database v12 and SQL Server 2016 to help combat this cyber-attacks:

Row Level Security (RLS)

Dynamic Data Masking

Always Encrypted

 

This first part of the demo is meant to show how an organizational hierarchy and asset hierarchy can be leveraged together to provide Row Level Security on tables in a SQL Server database using the new predicate based RLS feature in SQL Server 2016 and Azure v12. This demo is completely made up oil well production data for a fictitious organization with 153 fictional employees and come as in with ABSOLUTELY NO WARRANTY or GUARANTEE!

Also, the demo will show how to use RLS with the HieararchyID Datatype, the new Dynamic Data Masking and Always Encrypted Security Features.

This post is about the demo which will show you an advanced implementation template for RLS as well as some of the other security features. Please use the links above to the official Microsoft documentation to learn about each feature first before trying out the demo as it will help you understand the demo better.

If you attended the session at SQL Summit 2015, Overview of Security Investments in SQL Server 2016 and Azure SQL Database [DBA-327-M], which I co-presented with the Microsoft SQL Server Security team this is the demo we used at the end.

 

Asset and Organization Hierarchies and RLS

 

The basis of the security is nodes in the organizational hierarchy are granted access to levels in the asset hierarchy and those grants filter down to the members below in the hierarchy. This allows for inheritance of permissions via the Organization and Asset Hierarchy (i.e. Child nodes can inherit from Parent Nodes).

Functionality is built-in to the model to override the security model for a lower member including denying access altogether (‘NONE’ in the security map) or granting access to all data (‘ALL’ in the security map) via exceptions (SEC_USER_EXCEPTIONS). A base user table exists (SEC_ORG_USER_BASE) that has the relationship of employee to manager as well as the organizational unit id for the employee. This table would likely be fed from an HR system in a production scenario. A master table for the wells (WELL_MASTER) contains the asset hierarchy to filter security to the well data. Read the notes about the tables below for more details.

Below shows what an Asset and Organizational Hierarchy would look like (also, this is what is in the demo) and finally a walk down one branch of the organizational hierarchy to see how to apply RLS against the Asset Hierarchy.

 

Asset Hierarchy (snippet)

 

 

Organizational Hierarchy (snippet)

 

Asset and Organization Hierarchy and RLS (CEO to one USER)

 

 

 

Scripts and Explanations

 

There are 6 scripts to run to see the entire demo (and you just run them in the order they are number, 1 – 6):

1 – Oil&Gas RLS Demo – BuildTables.sql

The script will create the database, oilgasrlsdemo2016, and the tables needed for the demo

The Tables are as such:

  • ASSET_HIERARCHY – Table contains the asset hierarchy and is used to build the demo data for the asset hierarchy in the Well_Master table.
  • DATES – Generic table to hold a date dimension for use later in an Excel Power Pivot model.
  • SEC_ASSET_MAP – Table contains the entries mapping the organization units to the asset hierarchy levels for access to the data. The table would be managed by the security team for the application. Users and their subordinates are denied access to data via an entry of ou, ‘NONE’, ‘NONE’ or granted all access via an entry of ou, ‘ALL’, ‘ALL’.
  • SEC_ORG_USER_BASE – Table contains the employees including the employee to manager parent/child relationship to build an organization hierarchy and the organizational unit id for mapping to asset hierarchy levels for security. This table would likely be fed from an HR system. Also, will demonstrate Always Encrypted in this table.
  • SEC_ORG_USER_BASE_HID – Same as SEC_ORG_USER_BASE but includes HierarchyID column to demonstrate RLS with HierarchyID data types and to demonstrate Data Masking.
  • SEC_ORG_USER_BASE_MAP – Table contains the employee data including an entry (SECURITY_CLEARANCE) to denote the security clearance the employee is granted by walking down the organization hierarchy and finding the lowest level above including themselves that has been granted access to data. The SEC_ASSET_MAP table is used along with the SEC_ORG_USER_BASE table to generate the data in this table. The procedure REFRESH_SECURITY_TABLES is called to refresh the data in this table.
  • SEC_USER_EXCEPTIONS – Table contains entries to allow for overrides of the organization hierarchy based model. Any employee entered here will use permission defined in this table instead of what is inherited from the organizational hierarchy.
  • SEC_USER_MAP – This table is generated by the REFRESH_SECURITY_TABLES procedure and generates the asset level access for each user in the database based upon the values in the security tables SEC_ORG_USER_BASE_MAP, SEC_ASSET_MAP and SEC_USER_EXCEPTIONS. This is the ONLY table used by the functions for the Security Policy. The other SEC_ tables are used to generate this table for RLS.
  • WELL_DAILY_PROD – Contains COMPLETELY made-up and randomly generated daily well production data for Oil, Gas and NGL. Primary key is WELL_ID and RLS is achieved by using the asset hierarchy in the WELL_MASTER table to filter the WELL_IDs. This is a Clustered ColumnStore Indexed table.
  • WELL_MASTER – Contains COMPLETELY made-up and randomly generated master well data including the made up asset hierarchy. This is the main business table used for RLS for ALL well tables.
  • WELL_REASON_CODE – Contains COMPLETELY made-up and randomly generated daily well downtime data for Oil, Gas and NGL. Primary key is WELL_ID and RLS is achieved by using the asset hierarchy in the WELL_MASTER table to filter the WELL_IDs.

 

2 – Oil&Gas RLS Demo – LoadTables.sql

This script is used to load or generate the demo data including user and security tables and hierarchy and well data. There are two parameters close to the top that can be used to specify the amount of data to load – @wellcount and @drillyearstart. @wellcount specifies the number of wells to generate and @drillyearstart specifies the first possible year to use for a well. The start date for a well will be randomly selected between @drillyearstart and the current year the script is run.

 

3 – Oil&Gas RLS Demo – Security Setup.sql

This script sets up the RLS functions, policies and the procedure REFRESH_SECURITY_TABLES. The procedure REFRESH_SECURITY_TABLES is used to generate the RLS security mappings in the SEC_ tables as described in the sections above. In a production environment this procedure would need to be run every time the hierarchies were updated or new users were added to the database.

This script also will build users in the database based on the userids generated in SEC_ORG_USER_BASE table for testing RLS.

 

4 – Oil&Gas RLS Demo – Test RLS Security.sql

This script contains sample queries to test RLS at different levels in the organizational hierarchy and asset hierarchy. It also will add another entry in the security table granting a user access to another level in the hierarchy along with their current level and run the procedure to update the security to include this new entry.

 

5 – Oil&Gas RLS Demo – RLS Security with HierarchyID and Data Masking.sql

 

This script makes a copy of the SEC_ORG_USER_BASE table called SEC_ORG_USER_BASE_HID that contains a version of the Organizational Hierarchy using the HierarchyID Datatype. It shows how to populate a HierarchyID Datatype from a Parent/Child Hierarchy and will implement RLS security using the HierarchyID Datatype as well as add some new columns, EMAIL_ADDRESS and DOB (Date of Birth) to the table to highlight Dynamic Data Masking in SQL Server 2016.

 

Info about HierarchyID – https://msdn.microsoft.com/en-us/library/bb677290.aspx

 

6 – Oil&Gas RLS Demo – Always Encrypted.sql

This script will setup the Master and Column keys for Always Encrypted as well as add a new encrypted column, SSN, to the SEC_ORG_USER_BASE table to test Always On Encryption. NOTE: If your database you are using for this demo is on a separate server than where you will run the application (see below) included in the download to test all the features you will need to run this script from SSMS on the machine you run the application and it will need .NET 4.6.

 

ERD of O&G RLS Demo DB

 

 

 

Demo Application

 

There is an application built in Visual Studio 2015 (.NET 4.6) that you can use to test out the demo once you run all 6 scripts (Source code is included as well!). Just run the EXE in the download on a machine with .NET 4.6 installed and the instructions are at the bottom of the application (see screenshot below).

Use the WELL_MASTER, WELL_DAILY_PROD and/or WELL_DOWNTIME to test the Parent/Child RLS Demo (Scripts 1 – 4). Use the SEC_ORG_USER_BASE_HID table to test the RLS with HierarchyID and Dynamic Data Masking (Script 5). Finally, use the SEC_ORG_USER_BASE table to test the Always Encrypted. You can EDIT the fields by clicking inside of them so for the Always Encrypted Demo you would click in the cell for SSN for an employee and enter a valid SSN and click “Commit” (see screenshot below).

 

 

 

Demo Instructions, Download and Contents

 

Below is the contents of the demo download and descriptions about each file/folder. To run through the demo, do the following:

  1. Download and extract the files to a folder
  2. Run all the scripts in the MSSQL 2016 folder in order, 1-6, on SQL Server 2016 CTP 2.4+ or Azure SQL Database v12.
  3. Then use the OGSecurityDemo2016.exe file to run the Demo Application

MSSQL 2016 – Folder with the 6 demo scripts
OSSecurityDemo2016 –
Folder with the source code (VS2015 .NET4.6) for the Demo Application
SimpleRLSExample –
Folder with a simple RLS Demo (start slow)
O&G_Demo_ERD.png –
Graphic of ERD
OGSecurityDemo2016.exe –
Demo Application (you can just run it if you have .NET 4.6)
Oil&Gas SQL Server Security Demo.docx –
Overview of the Demo

 

Link to download – https://onedrive.live.com/redir?resid=4930901E9435C751!135238&authkey=!AF7fpcw5K5waySg&ithint=folder%2czip

 

Hope you enjoy and tweet me @STATCowboy if you have questions, comments or concerns or need help with the demo!

Setup Azure SQL Database using the New Azure Portal

April 7, 2015

Jamey Johnston

Azure SQL Database is a relational database-as-a-service that allows for scaling up to thousands databases. It is a self-managed service that allows for near-zero maintenance, in other words, Microsoft handles all the backups, patching and redundancy. Azure SQL Database offers service tiers that allows for dialing up or down the horsepower as needed which also means the pricing can scale up or down. Pricing of the Service Tiers range from ~ $5/month to the slowest and smallest in size to ~$3,270/month for the fastest and largest.

More Info on Azure SQL Database

In this post we will walk through the steps to create an Azure SQL Database using the new Azure Portal. Please note the new Portal is still in beta so the steps and screenshots may change as Microsoft still rolls out the new portal.

This post is based on the documentation on Azure, Get started with SQL Database by jeffgoll, which shows how to setup an Azure SQL Database using the older Azure Portal.

 

Step 1: Create an Account on Azure

 

This step can be skipped if you already have an account on Azure. If not, go to http://azure.microsoft.com and click on the “Free Trial” in the upper right corner to get started.

 

Step 2: Logon to the New Azure Portal and Provision a Virtual Server and SQL Database

 

  1. Access the new Azure portal at http://portal.azure.com and login with your Azure Account.
  2. Click “New” at the bottom left of the page, then “Data + Storage”, then “SQL Database” to start the SQL Database wizard.

  3. In the “SQL Database” panel enter your desired Azure SQL Database name in the “Name” field (e.g. myfirstazuresqldb) then click “Server / Configure required settings“. In the Server Panel pop-out to the side, click “Create a new server” (I am assuming if you are reading this tutorial you don’t have an existing Azure SQL Server setup!).

     

     

     

  4. In the “New Server” panel pop-out enter an Azure SQL Server name in the “SERVER NAME” field (e.g. myfirstazuresqlserver), a Server Admin Login account name in the “SERVER ADMIN LOGIN” field (e.g. SQLAdmin), a password in the “PASSWORD” field and enter the password again in the “CONFIRM PASSWORD” field. Choose the location where you want the server to reside by clicking in the “LOCATION” area and choosing the Location. Leave the “Yes” chosen in the “CREATE V12 SERVER (LATEST UPDATE) so we can learn some new features in a later blog post. Leave “ALLOW AZURE SERVICES TO ACCESS SERVER” checked. Click “OK” at the bottom of the “New Server” panel to continue.

     

     

  5. Back on the “SQL Database” panel leave “Blank Database” selected in “SELECT SOURCE” option and click on the “PRICING TIER” option and choose “B Basic” in the Pricing Tier panel pop-up and click “Select” at the bottom.

     

     

     

  6. On the “SQL Database” panel you can set the desired Collation but we will leave it the default for now. Click on “Resource Group”, then in the “Resource Group” panel choose “Create a new resource group” and then in the “Create resource group” panel type a resource group name in the “Name” field (e.g. myfirstresourcegroup”).

     

     

  7. Choose the “Subscription” (e.g. Visual Studio Ultimate with MSDN) to pay for the Azure service. Check “Add to Startboard” to have the SQL Database show up on the Portal front page. Then click “Create” to begin provisioning your new Azure SQL Database! 🙂

     

     

  8. The page will go back to the Portal front page and a tile will appear with the title “Creating SQL Database”. Also, in the side bar to the left you will see a notification.

     

     

     

  9. Once the database is provisioned the pages will refresh and show the Azure SQL Database Dashboard page.

     

     

  10. Your database is provisioned!

     

     

Step 3: Add Firewall Rule to Access Azure SQL Database

 

  1. Click on the Azure SQL Server under the “Summary” section.

     

     

  2. The “SQL Server” panel will pop-out to the right, click “Settings” to open up the SQL Server configuration panel to the right.

     

     

  3. In the “Settings” panel click on “Firewall”. In the “Firewall Settings” panel that pops-out enter a “RULE NAME” (e.g. Home), the START IP and END IP which will probably be the same IP address if testing from home. You can use a website like whatismyip.com to get your external IP address. Click “Save” at the top once finished entering the values. (NOTE: The values below are not valid and you should enter the correct ones for your location).

    Your IP may change so check periodically to make sure you have the correct IP addresses in the firewall rules. Also, if trying to connect from work your work firewall may block access to the default port of 1433 which is used for access to the Azure SQL Database so if you are having issues trying to connect from work they may be your issue! J

     

     

  4. A message should appear indicating the firewall rules were successfully updated. Click “Ok”.

     

     

  5. Firewall is configured.

 

Step 4: Setup SQL Server Management Studio to Manage and Access Azure SQL Database

 

  1. Finally to test your new Azure SQL Database download the SQL Server 2014 Management Studio Express and install using the defaults. Choose the 32-bit or 64-bit version depending on your O/S version (probably 64-bit for most).

     

     

     

     

  2. Launch SQL Server 2014 Management Studio from your PC and login to the server you created in Step 2 using the server name, admin account and password and click “Connect”.

     

     

     

     

  3. Congratulations you have created your first Azure SQL Database!