Chapter 2 - Right, Wrong, & Better!

Posted by Beverly Delarosa on March 25, 2019

There is a right way, a wrong way, and an even better way when it comes to using pie charts.

Before diving into our pie charts’ “report card,” why do we use any types of charts in the first place?

  • Charts allow us to take information from sets of data and make it more understandable, and importantly for data scientists, more presentable.

  • In general, having charts make it easier to compare your different sets of data and the more information you can convey in your charts without making them more complex, the better. When your boss can get their answers quickly without having to decipher through your work, that is always a good sign.

Why pie charts? A “pros” reasoning.

  • Pie charts are one of the most common charts that many people know how to read the basics of.

  • These circular-shaped types of charts (including donut charts too) are typically used to tell a story about the parts-to-whole aspect of a set of data. As a one-dimensional display, if you like looking at relativity in pictures, you can straightforwardly see if a subset is more in quantity than another.

Why not pie charts? A “cons” list.

Again reiterated: as a one-dimensional display of relativity, pie charts plainly show you proportions.

*Note: All of the following examples are all made up data sets.

  1. Having more than five, “lengthy-descripted” categories that are mutually exclusive (i.e., the data does not overlap) will make the information appear more crowded, and less reader-friendly; remember, we want to convey more information without making it more complex. If the subsets are not mutually exclusive, a workaround is to add another category, but that just calls for a positive-feedback to add more categories.
► ► ► Example 1a. ◄ ◄ ◄
What are the statuses of shipments in each territory?
Data Set of Shipment Statuses:
North East South West
Active & Open 31 44 13 12
In Transit 25 32 19 14
Complelted 35 10 22 26
![
](https://www.imgur.com/2BJ0dvm.png)


  1. Assuming they tell you the percentage for each part “slice” would be useful. But sometimes they do not and that can be a lot of guesswork into measuring the angle, especially if they are seemingly similar. Furthermore, after given actual/approximation of the angle percentage, more operative calculation may need to be done to get the specific dataset info in the set and/or subset. That extra work on your part can easily be avoided.
► ► ► Example 2a. ◄ ◄ ◄
We own an exotic animal sanctuary that can occupy up to 300 animals
of eight different species. Given that we are at full capacity and if we
have to bulk buy food for 100 animals at a time based on their diet,
how many different combinations can we group together?
Data Set of Exotic Animals:
Animal Tally Diet
Wolf 44 Carnivore
Falcon 35 Carnivore
Moose 44 Herbivore
Elephant 39 Herbivore
Red Fox 48 Carnivore
Tiger 21 Carnivore
Snapping Turtle 27 Herbivore
Panda 42 Herbivore
![
](https://www.imgur.com/DlAg4Uw.png)


  1. In many real-world circumstances, data is presented as incomplete, more frequently so the larger the set. Well, if the parts do not sum up to the meaningful whole, a pie chart cannot represent the data, period. The total of a subsample and the comparison of each element value to the artificial whole are not meaningful in the least.
► ► ► Example 3a. ◄ ◄ ◄
We just finished November’s sales data. Which quarter was our best
performing? How many orders were fulfilled this past year?
Data Set of Quarter Performances:
Quarter Month Product's Market Value Orders (M→Q cumulative)
Q1 January 30 52 52
February 33 49 101
March 32 51 152
Q2 April 28 68 68
May 25 78 140
June 26 71 211
Q3 July 29 63 63
August 31 51 114
September 27 72 186
Q4 October 34 47 47
November 28 55 102
December 21 102
![
](https://www.imgur.com/ACG1h2S.png)


  1. Furthermore, people want to seem smarter and fancier, so they try to make a 3-dimensioinal pie chart version with differing shades/opacity/design fill. Aesthetically, there is no logical sense nor linear gradient scale to do these, and only adds to the distraction and enhances the complexity of its clutter. This happened in our 2nd example above when we did not take into account color blindness. Also to consider, if you have to print your chart(s), you will not always have access to a color printer and the grayscale may not be easily distinguishable.
► ► ► Example 4a. ◄ ◄ ◄
What is the typical time spent walking your dog every day for 10 weeks?
Data Set of Dog Walks:
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Week 1 22 24 28 24 25 26 26
Week 2 24 31 26 15 0 28 22
Week 3 14 22 12 16 17 19 12
Week 4 25 0 14 23 14 13 15
Week 5 12 30 15 15 17 19 30
Week 6 14 33 22 0 13 18 26
Week 7 29 25 26 15 18 24 18
Week 8 25 30 13 21 15 0 16
Week 9 15 11 22 14 22 19 25
Week 10 11 22 10 31 29 16 16
![
](https://www.imgur.com/ozJc643.png)


Ultimately, pie charts are rarely a good fit for the problem they are intended to solve and analyze because they are poor at communicating the desired observations in the data.

So then, what is an even better way than the pie chart?

Arguably, there is almost always an alternative visualization choice that displays the data more clearly and effectively than a pie chart. So, I recommend to always opt for that alternative. The more efficient charts do not require data labeling nor multiple colors (unless there are different series); you should not need to have any extraneous numbers in your chart to make your point.

  1. But if you are adamant in showing your figure in relations to a whole, and your category descriptions are on the lengthier side, try instead squaring the pie/a waffle chart/a tree map. A 10x10 cell square makes it possible to read values precisely to a (rounded) single percent. You can either separate the subsets or combine it into one large square. However, in some programs, it is possible that your chart will default into a rectangle shape, in which it again can make it obscure to compare all the categories, but at least it is more accurate than a pie chart.
► ► ► Example 1b. ◄ ◄ ◄
What are the statuses of shipments in each territory?
Data Set of Shipment Statuses:
North East South West
Active & Open 31 44 13 12
In Transit 25 32 19 14
Complelted 35 10 22 26
![
](https://www.imgur.com/NbSLBNx.png)


  1. Bar charts are cleaner than waffle charts because you can compare each category to every other category efficiently with raw values rather than percentages. The exact numbers give us the ability to be, in general, more credible for anyone to look at the report and have confidence that the source is reliable. If you have a sample population of 100 vs 100,000, which data set would you more likely want sourced?
► ► ► Example 2b. ◄ ◄ ◄
We own an exotic animal sanctuary that can occupy up to 300 animals
of eight different species. Given that we are at full capacity and if we
have to bulk buy food for 100 animals at a time based on their diet,
how many different combinations can we group together?
Data Set of Exotic Animals:
Animal Tally Diet
Wolf 44 Carnivore
Falcon 35 Carnivore
Moose 44 Herbivore
Elephant 39 Herbivore
Red Fox 48 Carnivore
Tiger 21 Carnivore
Snapping Turtle 27 Herbivore
Panda 42 Herbivore
![
](https://www.imgur.com/faf5MMH.png)


  1. Line charts are great for visualizing change over time. Because the data is connected by a continuous “slope” line, we can see how a value develops. The relationship of time between each category should be consistently periodical and not random. Bonus points if you can ratio multiple y-axes to further analyze why these values may be trending the way they are.
► ► ► Example 3b. ◄ ◄ ◄
We just finished November’s sales data. Which quarter was our best
performing? How many orders were fulfilled this past year?
Data Set of Quarter Performances:
Quarter Month Product's Market Value Orders (M→Q cumulative)
Q1 January 30 52 52
February 33 49 101
March 32 51 152
Q2 April 28 68 68
May 25 78 140
June 26 71 211
Q3 July 29 63 63
August 31 51 114
September 27 72 186
Q4 October 34 47 47
November 28 55 102
December 21 102
![
](https://www.imgur.com/oZp5fir.png)


  1. Box (and whisker) plots go little further in showing us the descriptive statistics of the data by grouping the ordered numerical data through their quartiles. We can distinguish the degree of dispersion (i.e., the spread distribution). In addition to the points themselves, the range is made known (via the top/maximum and bottom/minimum horizontal line joined by the vertical “whiskers”), often excluding any outliers that can ultimately skew our data. The interquartile range is the middle 50% of our data (via the boxes to show the 1st (and 2nd) and 3rd quartiles). The 2nd quartile (i.e., the median) is useful for data sets where you want to get a sense of a "representative" measure of centrality since the mean would be skewed by outliers.
► ► ► Example 4b. ◄ ◄ ◄
What is the typical time spent walking your dog every day for 10 weeks?
Data Set of Dog Walks:
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Week 1 22 24 28 24 25 26 26
Week 2 24 31 26 15 0 28 22
Week 3 14 22 12 16 17 19 12
Week 4 25 0 14 23 14 13 15
Week 5 12 30 15 15 17 19 30
Week 6 14 33 22 0 13 18 26
Week 7 29 25 26 15 18 24 18
Week 8 25 30 13 21 15 0 16
Week 9 15 11 22 14 22 19 25
Week 10 11 22 10 31 29 16 16
![
](https://www.imgur.com/b3vKyKu.png)