July 21, 2019

Exploratory Data Analysis - Employee Engagement Example

The first step in business analytics project is exploring the data.  Let's look at an employee engagement example where we will try to find the correlation among predictors, and between predictors and target variables.

Two variables X and Y are positively correlated if high values of X go with high values of Y, and low values of X go with low values of Y.  On the other hand, if high values of X go with low values of Y, and vice versa, the variables are negatively correlated.

Here's the data imported into an R project:


We begin by looking at the structure of the data.  The purpose is to check the data types for various variables are correct.

Note: NPS stands for Net Promoter Score, which is a measure for customer experience to predict business growth.  Calculate your NPS using the answer to a key question, using a 0-10 scale: How likely is it that you would recommend [brand] to a friend or colleague? 0-6 are unhappy customer, 7-8 are unenthusiastic and 9-10 are customers who'd be loyal to your brand and will refer others.


Next we look at typical values and spread of the data:


Are there any relationship between empEngagement and empSales?  Let's plot a scatter plot to find out:


Each dot is an observation (empEngagement, empSales) here.  We can't seem to be able to draw a line that fit the dots.  Notice that most of the empSales are below 70 mark, but there seems to be an outlier for the empSales of 80.

Next, let's visualize a summarized distributions of data:


Looks' like a the empEngagement seems to centre at 3.5.  This number matches the mean or median figure we've got from the earlier summary of the data.

Next we're going to look at the correlation matrix, which is a table where the variables are shown on both rows and columns, and the cell values are the correlations between the variables.

The Correlation coefficient is a metric that measures the extent to which numeric variables are associated with one another (ranges from –1 to +1).


There is a strong positive correlation (dark blue) between empEngagement and NPS.  More engaged employees give better customer experience.  

But surprisingly there is a negative correlation between NPS and employee Sales. Isn't more satisfied customers means more sales?  Haha, I played a trick here as the data is fictitious.  I promised this won't be the last time you see this ;)

Another way to at the data is to use boxplot to show the distribution of NPS and empSales.  Here I have created a calculated column NPS2, which is applying the floor function on NPS.


Can you show that there's a negative correlation between NPS2 and empSales?

