Data Visualization: How the Skewness and Kurtosis Lead Visual Distortion

sangjun_park - Aug 23 - - Dev Community

1. Introduction

Data Visualization is one of the most important parts of modern society. So, it is also important to clearly show data distribution and how to show it without distortion. However, In some cases, data visualizations are distorted to mislead people intentionally. There are lots of cases to explain about this problem.

Image description

This image is an example of how data can be intentionally misleading. Tim Cook presented a graph of cumulative iPhone sales, which at first glance may not seem distorted, especially since the title clearly states “Cumulative iPhone Sales.” However, the graph inherently shows an upward trend due to the nature of cumulative data, which can create a misleading impression of continuous growth, regardless of the actual sales rate.

Image description

This graph reveals declining iPhone sales, but it’s difficult to discern this from the graph that Tim Cook presented. As illustrated in this case, data is sometimes selectively presented to create a more favorable impression, leading others to interpret it in a way that benefits the presenter.

Image description

Here are some other examples that can lead to misleading judgments. Jason E. Chaffetz, a former U.S. representative for Utah’s 3rd congressional district, presented a graph suggesting that there are more abortions than cancer screening and prevention services. However, this graph is highly distorted because it doesn’t accurately reflect the absolute figures. According to Chaffetz’s version, there were 327,000 abortions in 2013, while cancer screenings numbered 935,573—nearly three times the number of abortions. As a result, the graph presented in the “Honest Version” is much more accurate than Chaffetz’s version.

2. Definition

Now, I'll discuss about the definition of 'Skewness', and 'Kurtosis'. And why these two things can distort data.

2.1. Skewness

Skewness is a statistical measure that describes the asymmetry of a distribution around its mean. In other words, it indicates whether the data points in a dataset are distributed evenly on both sides of the mean or if they tend to cluster more on one side.

Image description

There are two types of skewness to consider. The first is ‘Negative Skew (Left Skew),’ where the distribution is negatively skewed when the tail on the left side (lower values) is longer or fatter than the right side. In this case, most data points cluster on the right side of the distribution, with fewer smaller values extending the tail to the left. The second type is ‘Positive Skew (Right Skew),’ which is essentially the opposite. Here, most data points cluster on the left side of the distribution, with fewer larger values extending the tail to the right.

2.2. Kurtosis

Kurtosis is a statistical measure that describes the "tailedness" or the shape of a distribution's tails about its overall shape, particularly compared to a normal distribution. It provides insight into the extremity of deviations (outliers) in a dataset.

Image description

This graph illustrates the concept of kurtosis by comparing different types of distributions. It shows three curves representing different levels of kurtosis relative to the normal distribution.

First, let’s discuss positive kurtosis, also known as leptokurtic. A leptokurtic distribution has a taller and sharper peak than a normal distribution. It exhibits higher peaks and more extreme outliers, indicating that the data is more concentrated around the mean with a greater occurrence of extreme values.

On the other hand, negative kurtosis, also referred to as platykurtic, is the opposite of leptokurtic. A platykurtic distribution is flatter and broader than a normal distribution, with a lower peak and thinner tails. Distributions with negative kurtosis have fewer outliers and a more even spread of data values, indicating that the data is more evenly distributed with fewer extreme values.

3. Problem & Solving

3.1. The Perspective of Skewness

Image description

This image illustrates how skewness affects the relationship between the mean, median, and mode. In a symmetrical distribution, the mean, median, and mode are nearly the same. However, in a negatively skewed distribution, the mean and median are lower than the mode, which might lead to the misconception that the mean and median are higher than they are. Conversely, in a positively skewed distribution, the mean and median are greater than the mode, with the mean being significantly higher than the median.

Furthermore, extreme skewness can cause people to focus on the minority of the data rather than the majority. For instance, in a dataset with high positive skewness, most of the data is concentrated on the left side of the histogram, while a long tail extends to the right, representing a few high-value outliers. Conversely, in a dataset with high negative skewness, the histogram shows most of the data concentrated on the right, with a long tail extending to the left. These outliers can complicate decision-making.

Image description

To address this issue, log transformations or exponential transformations can be applied to reduce skewness and minimize distortion. Additionally, providing further context can help ensure a more accurate interpretation of the data.

Log transformation is a common technique used to normalize a positively skewed distribution, where the tail extends to the right. The logarithm function compresses large values into a smaller range— for example, log(1000) = 3 while log(10) = 1, reducing the difference between large numbers. At the same time, smaller values are spread out more after the transformation. This effect narrows the distribution, reduces the tail, and results in a more symmetric distribution.

Unlike log transformations, which are commonly used to reduce skewness in a distribution, exponential transformations take the opposite approach. Although less common for reducing skewness, exponential transformations can be useful in specific contexts where increasing skewness or spreading out the data more widely is necessary. Therefore, they are generally more applicable when dealing with negatively skewed data.

3.2. The Perspective of Kurtosis

Kurtosis, like skewness, can also lead to visual distortions in data. For example, when kurtosis values are very high, the data may appear more extreme or prone to outliers, as the heavy tails suggest a higher frequency of extreme values. This could result in an overestimation of risk or variability in the data.

On the other hand, low kurtosis can make the data appear more uniform or less extreme, potentially leading to an underestimation of the presence of outliers or variability.

To prevent misleading conclusions, it’s important to annotate or explain the presence of high or low kurtosis when visualizing data. This helps viewers interpret the data more accurately. Alternatively, outliers that contribute to very high kurtosis can be adjusted or removed, but this process must be approached with caution and a thorough understanding of the impact on the analysis.

4. Conclusion

It can be easy to lead people in a desired direction by distorting data. Therefore, we must strive to minimize data distortion and provide users with visualizations that convey the most accurate information possible. However, there are times when we unintentionally present information that could lead others to misunderstand. The most notable examples of this are skewness and kurtosis.

For this reason, we have closely examined how skewness and kurtosis can distort data and explored the methods to prevent such distortions.

Reference

. . . . . .
Terabox Video Player