Image Source: rawpixel on Unsplash
We have recently made the case for why businesses need to solve the data literacy problem, so I won’t dwell on the benefits of this skill for too long. But I’d like to resurface the three critical gains obtained from enhanced data literacy skills in a business setting:
Higher ROI. Talk to any modern business about data-driven decisions and you’ll quickly realise that it is the aspiration of most organisations to drive more value from the data they already have as well as from the data they purchase. Businesses throw large amounts of money into market research and BI solutions, often without thoroughly evaluating their team’s ability to assess, interpret and understand the data that comes their way. With improved data literacy across all teams and disciplines, businesses could achieve a higher ROI in a shorter space of time.
Better decisions. The biggest impediment to business agility is the culture of inertia that exists in most organisations. Our reliance on gut decisions, intuition and poor habits can lead to slower processes, diminished ROI and sometimes even financial losses. Creating a working culture that favours facts over guesswork will help companies instil a new way of thinking, leading to better decisions and a healthier-looking bottom line.
Change from within. Business data shouldn’t be reserved for board meetings. The more people engage with business insights, the more value organisations will realise from every team in every discipline. If misdirected actions are taken as a result of misinterpreting the data at the bottom of the organisational pyramid, there’s little that top management can do to catch those mistakes in time or prevent them in the future. Ensuring all employees in the organisation are data literate should be a priority for every forward-looking business.
The following practical examples aim to provide some guidance on key factors to consider when analysing data. Whether you’re a business user or a Qlik Sense developer, you should find some actionable insights to unlock more value from your data visualisations.
Trends & Context
Data without context is meaningless at best, but it can also be responsible for leaching productivity from your organisation. If you think about your business data as a pyramid – raw data at the bottom, formatted data in the middle providing more context and key insights that are used to inform decisions at the top – then the importance of data analysis becomes even more apparent.
Throwing raw data or showing a series of numbers to your employees won’t help them make better decisions, it will bog them down and fuel confusion.
Poorly visualised Key Performance Indicators (KPI’s) often fall under the category of meaningless data. Take a look at the examples below:
Imagine you see this KPI in your dashboard. Looks like a lot of money, but does it say anything about the company's performance? Is it good or bad? How does it compare with the previous quarter?
Adding a little context can go a long way – KPIs are there to give us a snapshot of the company's performance at a glance and they should be designed with that purpose in mind.
Vizlib KPI Designer
With the new layering concept available in our KPI Designer, you can combine charts, metrics and icons to create a bespoke KPI that meets your business needs.
Internal & External Data
Comparing apples to apples seems like a pretty straightforward approach, but when it comes to handling internal and external data, drawing fair comparisons becomes rather complicated.
The most important consideration to keep in mind when comparing internal data, which means making comparisons within a business, is to look at like for like comparisons. For example, you wouldn’t get much value from comparing the sales of a coffee chain outlet in a busy train station with the one in a residential complex. That’s an unfair and inaccurate comparison. To normalize the results of such comparison, you would need to look at other factors influencing the performance of each outlet, such as the number of employees in each branch, the number of competitors in the area, the socio-economic makeup of each locality, the average footfall volume across each location and the population in the catchment zone.
Image Source: rawpixel on Unsplash
When attempting to draw comparisons external to a company, we need to take into account different considerations. For instance, without any context, a company’s sales growing by 5% may look like a great achievement, but if the overall market has gained 10%, then it means the company is actually underperforming and losing market share. Equally, if a company’s sales have plummeted by 5%, but the overall market has fallen by 10%, it shows that the company is performing well in the changing market climate. To uncover real and valuable insights, organisations must first determine what type of data they need to construct fair comparisons and then purchase it from external agencies.
Cohorts & Cell-based Analysis
Rather than looking at all users as one unit, Cohort Analysis clusters them into related groups, helping to identify patterns, behaviours and insights related to that cohort. It’s often used as a great form of normalising data in comparisons or when scouring for trends. These related groups (or cohorts) generally share common characteristics or experiences within a defined period of time. For example, grouping customers based on their subscription day or the day when they made their first purchase. Studying the trends of cohorts from different periods in time can indicate whether the situation is improving, worsening or staying stagnant.
The main purpose of performing a Cohort Analysis is to unlock actionable insights on how to improve customer acquisition, user experience, revenue, turnover, and so on. The Cohort Analysis process can typically be broken down into four main steps:
- Define your problem or question
- Determine the metrics that will help you find the answer
- Select specific cohorts that can help you answer the question
- Perform the cohort analysis
Cell analysis is essentially a subcategory of Cohort analysis and is typically used to compare the lowest relevant organisational unit across a business. For example, a global automotive company might want to look at the performance of a subcategory in different markets, but not go into too granular analysis of individual brands. In this case, one cohort could be a concatenation of “Country” and “Product subcategory”, creating a cell, which could be something like “Spain -- Vans”.
This cell would group a few different models and brands of this particular product subcategory, with an overarching goal of aggregating enough relevant data to highlight variation but not so much that it would become cluttered and difficult to navigate. It essentially gives you a way to zoom in on particular product subcategories to view sales performance and market share performance in different countries, making it super easy to spot both top selling types of products and subcategory outliers.
Vizlib Scatter Chart
Averages, Aggregation & Distribution
The way averages are understood by the general public is that it is “the number in the middle” or the number that is “balanced”. The whole point of using averages as a benchmark in data visualisations is to give readers a quick “representative sample” of a data set. It’s a popular choice in both public and corporate life, with some of the most important economic metrics, such as a country’s GDP (Gross Domestic Product), for example, represented in averages.
However, what is often left undiscussed is the question of the distribution of the values that make up the average or aggregation. For instance, the Gender Pay Gap, one of the hottest topics in the media at the moment, is “the average difference between the remuneration for men and women who are working.” While it neatly illustrates the overall situation, it is still a broad stroke representation that doesn’t convey the complex nature of the matter. To understand which industries, professions and skills “cause” the pay gap, you would need to go into more detail, analysing the distribution of the values.
In business, averages are often used to get a snapshot of the data but can hide a deeper story. For example, if you looked at the average salary within an organisation and saw this:
Vizlib KPI Designer
...you'd probably conclude that it's pretty good. However, it's only half of the story – it doesn't give you any indication about the distribution of salaries across different roles, which might be hiding high pay inequality.
Another statistical method of understanding distribution in a data set is Standard Deviation. Standard Deviation measures the amount of deviation of a set of data values. If the standard deviation is large, it means the data points are spread out over a wider range of values, if it’s low, it indicates that data points are clustered closely around the mean.
One of the best ways to visualise and better understand distribution is to use a Distribution Plot. Here’s an example of how it helps you to get a quick insight into the distribution of the data set.
Image Source: Qlik
Bias & Non-Causal Correlations
There’s no denying that we are all biased in one way or another. Our past experiences, feelings, the knowledge we’ve accumulated and even our character shape our perspective on things. So, it’s unsurprising that various forms of bias also exist in the business world and can quickly bleed into the way we work with data. It may often be unintentional, sometimes even subconscious, and it’s almost never malicious, but we still need to be aware of its existence and take steps to limit its influence on the stories we derive from data – are you seeing what the data is telling you or twisting it to support your story?
The two main categories of bias are statistical bias and sampling bias and both predominantly relate to how data is collected and analysed. The voluntary response bias is a commonly used example that shows how survey respondents tend to only take action when asked to leave a product or service review if they had a particularly positive or negative experience, meaning they are pursuing a particular agenda.
Confirmation bias is another form of bias that can affect both sampling as well as the way people interpret data. Wikipedia defines confirmation bias, also called confirmatory bias or myside bias, as “the tendency to search for, interpret, favour, and recall information in a way that confirms one's preexisting beliefs or hypotheses.”
The biggest challenge here is that we will almost always find some data to support our agenda or beliefs, but that’s not what data visualisations are all about. Here are a few cool examples from Tyler Vigen’s blog on data that seems to correlate but has absolutely nothing in common.
Image Source: Tyler Vigen
Image Source: Tyler Vigen
So there you have it – five practical factors to consider when analysing data. What else would you put on the list?