There is a right way, a wrong way, and an even better way when it comes to using pie charts.
Before diving into our pie charts’ “report card,” why do we use any types of charts in the first place?
-
Charts allow us to take information from sets of data and make it more understandable, and importantly for data scientists, more presentable.
-
In general, having charts make it easier to compare your different sets of data and the more information you can convey in your charts without making them more complex, the better. When your boss can get their answers quickly without having to decipher through your work, that is always a good sign.
Why pie charts? A “pros” reasoning.
-
Pie charts are one of the most common charts that many people know how to read the basics of.
-
These circular-shaped types of charts (including donut charts too) are typically used to tell a story about the parts-to-whole aspect of a set of data. As a one-dimensional display, if you like looking at relativity in pictures, you can straightforwardly see if a subset is more in quantity than another.
Why not pie charts? A “cons” list.
Again reiterated: as a one-dimensional display of relativity, pie charts plainly show you proportions.
*Note: All of the following examples are all made up data sets.
- Having more than five, “lengthy-descripted” categories that are mutually exclusive (i.e., the data does not overlap) will make the information appear more crowded, and less reader-friendly; remember, we want to convey more information without making it more complex. If the subsets are not mutually exclusive, a workaround is to add another category, but that just calls for a positive-feedback to add more categories.
What are the statuses of shipments in each territory?
North | East | South | West | |
---|---|---|---|---|
Active & Open | 31 | 44 | 13 | 12 |
In Transit | 25 | 32 | 19 | 14 |
Complelted | 35 | 10 | 22 | 26 |
](https://www.imgur.com/2BJ0dvm.png)
- Assuming they tell you the percentage for each part “slice” would be useful. But sometimes they do not and that can be a lot of guesswork into measuring the angle, especially if they are seemingly similar. Furthermore, after given actual/approximation of the angle percentage, more operative calculation may need to be done to get the specific dataset info in the set and/or subset. That extra work on your part can easily be avoided.
We own an exotic animal sanctuary that can occupy up to 300 animals
of eight different species. Given that we are at full capacity and if we
have to bulk buy food for 100 animals at a time based on their diet,
how many different combinations can we group together?
Animal | Tally | Diet |
---|---|---|
Wolf | 44 | Carnivore |
Falcon | 35 | Carnivore |
Moose | 44 | Herbivore |
Elephant | 39 | Herbivore |
Red Fox | 48 | Carnivore |
Tiger | 21 | Carnivore |
Snapping Turtle | 27 | Herbivore |
Panda | 42 | Herbivore |
](https://www.imgur.com/DlAg4Uw.png)
- In many real-world circumstances, data is presented as incomplete, more frequently so the larger the set. Well, if the parts do not sum up to the meaningful whole, a pie chart cannot represent the data, period. The total of a subsample and the comparison of each element value to the artificial whole are not meaningful in the least.
We just finished November’s sales data. Which quarter was our best
performing? How many orders were fulfilled this past year?
Quarter | Month | Product's Market Value | Orders (M→Q cumulative) | |
---|---|---|---|---|
Q1 | January | 30 | 52 | 52 |
February | 33 | 49 | 101 | |
March | 32 | 51 | 152 | |
Q2 | April | 28 | 68 | 68 |
May | 25 | 78 | 140 | |
June | 26 | 71 | 211 | |
Q3 | July | 29 | 63 | 63 |
August | 31 | 51 | 114 | |
September | 27 | 72 | 186 | |
Q4 | October | 34 | 47 | 47 |
November | 28 | 55 | 102 | |
December | 21 | 102 |
](https://www.imgur.com/ACG1h2S.png)
- Furthermore, people want to seem smarter and fancier, so they try to make a 3-dimensioinal pie chart version with differing shades/opacity/design fill. Aesthetically, there is no logical sense nor linear gradient scale to do these, and only adds to the distraction and enhances the complexity of its clutter. This happened in our 2nd example above when we did not take into account color blindness. Also to consider, if you have to print your chart(s), you will not always have access to a color printer and the grayscale may not be easily distinguishable.
What is the typical time spent walking your dog every day for 10 weeks?
Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | |
---|---|---|---|---|---|---|---|
Week 1 | 22 | 24 | 28 | 24 | 25 | 26 | 26 |
Week 2 | 24 | 31 | 26 | 15 | 0 | 28 | 22 |
Week 3 | 14 | 22 | 12 | 16 | 17 | 19 | 12 |
Week 4 | 25 | 0 | 14 | 23 | 14 | 13 | 15 |
Week 5 | 12 | 30 | 15 | 15 | 17 | 19 | 30 |
Week 6 | 14 | 33 | 22 | 0 | 13 | 18 | 26 |
Week 7 | 29 | 25 | 26 | 15 | 18 | 24 | 18 |
Week 8 | 25 | 30 | 13 | 21 | 15 | 0 | 16 |
Week 9 | 15 | 11 | 22 | 14 | 22 | 19 | 25 |
Week 10 | 11 | 22 | 10 | 31 | 29 | 16 | 16 |
](https://www.imgur.com/ozJc643.png)
Ultimately, pie charts are rarely a good fit for the problem they are intended to solve and analyze because they are poor at communicating the desired observations in the data.
So then, what is an even better way than the pie chart?
Arguably, there is almost always an alternative visualization choice that displays the data more clearly and effectively than a pie chart. So, I recommend to always opt for that alternative. The more efficient charts do not require data labeling nor multiple colors (unless there are different series); you should not need to have any extraneous numbers in your chart to make your point.
- But if you are adamant in showing your figure in relations to a whole, and your category descriptions are on the lengthier side, try instead squaring the pie/a waffle chart/a tree map. A 10x10 cell square makes it possible to read values precisely to a (rounded) single percent. You can either separate the subsets or combine it into one large square. However, in some programs, it is possible that your chart will default into a rectangle shape, in which it again can make it obscure to compare all the categories, but at least it is more accurate than a pie chart.
What are the statuses of shipments in each territory?
North | East | South | West | |
---|---|---|---|---|
Active & Open | 31 | 44 | 13 | 12 |
In Transit | 25 | 32 | 19 | 14 |
Complelted | 35 | 10 | 22 | 26 |
](https://www.imgur.com/NbSLBNx.png)
- Bar charts are cleaner than waffle charts because you can compare each category to every other category efficiently with raw values rather than percentages. The exact numbers give us the ability to be, in general, more credible for anyone to look at the report and have confidence that the source is reliable. If you have a sample population of 100 vs 100,000, which data set would you more likely want sourced?
We own an exotic animal sanctuary that can occupy up to 300 animals
of eight different species. Given that we are at full capacity and if we
have to bulk buy food for 100 animals at a time based on their diet,
how many different combinations can we group together?
Animal | Tally | Diet |
---|---|---|
Wolf | 44 | Carnivore |
Falcon | 35 | Carnivore |
Moose | 44 | Herbivore |
Elephant | 39 | Herbivore |
Red Fox | 48 | Carnivore |
Tiger | 21 | Carnivore |
Snapping Turtle | 27 | Herbivore |
Panda | 42 | Herbivore |
](https://www.imgur.com/faf5MMH.png)
- Line charts are great for visualizing change over time. Because the data is connected by a continuous “slope” line, we can see how a value develops. The relationship of time between each category should be consistently periodical and not random. Bonus points if you can ratio multiple y-axes to further analyze why these values may be trending the way they are.
We just finished November’s sales data. Which quarter was our best
performing? How many orders were fulfilled this past year?
Quarter | Month | Product's Market Value | Orders (M→Q cumulative) | |
---|---|---|---|---|
Q1 | January | 30 | 52 | 52 |
February | 33 | 49 | 101 | |
March | 32 | 51 | 152 | |
Q2 | April | 28 | 68 | 68 |
May | 25 | 78 | 140 | |
June | 26 | 71 | 211 | |
Q3 | July | 29 | 63 | 63 |
August | 31 | 51 | 114 | |
September | 27 | 72 | 186 | |
Q4 | October | 34 | 47 | 47 |
November | 28 | 55 | 102 | |
December | 21 | 102 |
](https://www.imgur.com/oZp5fir.png)
- Box (and whisker) plots go little further in showing us the descriptive statistics of the data by grouping the ordered numerical data through their quartiles. We can distinguish the degree of dispersion (i.e., the spread distribution). In addition to the points themselves, the range is made known (via the top/maximum and bottom/minimum horizontal line joined by the vertical “whiskers”), often excluding any outliers that can ultimately skew our data. The interquartile range is the middle 50% of our data (via the boxes to show the 1st (and 2nd) and 3rd quartiles). The 2nd quartile (i.e., the median) is useful for data sets where you want to get a sense of a "representative" measure of centrality since the mean would be skewed by outliers.
What is the typical time spent walking your dog every day for 10 weeks?
Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | |
---|---|---|---|---|---|---|---|
Week 1 | 22 | 24 | 28 | 24 | 25 | 26 | 26 |
Week 2 | 24 | 31 | 26 | 15 | 0 | 28 | 22 |
Week 3 | 14 | 22 | 12 | 16 | 17 | 19 | 12 |
Week 4 | 25 | 0 | 14 | 23 | 14 | 13 | 15 |
Week 5 | 12 | 30 | 15 | 15 | 17 | 19 | 30 |
Week 6 | 14 | 33 | 22 | 0 | 13 | 18 | 26 |
Week 7 | 29 | 25 | 26 | 15 | 18 | 24 | 18 |
Week 8 | 25 | 30 | 13 | 21 | 15 | 0 | 16 |
Week 9 | 15 | 11 | 22 | 14 | 22 | 19 | 25 |
Week 10 | 11 | 22 | 10 | 31 | 29 | 16 | 16 |
](https://www.imgur.com/b3vKyKu.png)