Beautiful Box Plots in Excel 2013
Box plots are widely used among data scientists and statisticians. They're useful because they show variation both between and within data series. R, Python's matplotlib, and many other charting libraries support box plots right out of the...box, but Excel does not. In Excel 2013, with a little bit of imagination you can create nice looking box plots without writing any code. (If you are looking for a more comprehensive reference for charts and graphs in Excel, I recommend this book.)
Read this post to find out how to create box plots that look like this:
Here is a workbook that has the finished product if you don’t want to follow along.
You’ll need to start with a table containing the data you want to plot. I am using the data from the Michelson-Morley experiment:
A box plot shows the median of each data series as a line, with a “box” whose top edge is the third quartile and whose bottom edge is the first quartile. Often we draw “whiskers” at the top and bottom representing the extreme values of each series.
If we create an auxiliary data containing this data and follow my advice from my Error Bars in Excel post, we can create a nice looking box plot.
Step 1: Calculate Quartiles and Extremes.
Create another table with the following rows for each series: min, q1, q2, q3, max. These will be the primary data in your box plot. Min and max are easy – use the =MIN() and =MAX() formulas on each data series (represented as columns A – E in my example). To compute Q1-Q3 use the QUARTILE.INC() function. (INC means “inclusive”. QUARTILE.EXC() would work fine if that’s what you want.) Enter the formulas for the first series and then “fill right”:
Step 2: Calculate box and whisker edges
We are going to create a stacked column chart with error bars, and “hide” the bottommost column in the stack to make the chart look like a box plot. Therefore we have to calculate the tops and bottoms of our boxes and whiskers:
- The bottom of each box is Q1.
- The ‘middle’ of each box is Q2 (the median). Since this is a stacked column chart, we actually want to compute Q2 – Q1.
- The top of each box is Q3. Since we want to represent this as a “slice” in the stacked column chart, we want Q3 – Q2.
- The error going “down” in the chart is Q2 – min, since the whiskers start at the median.
- The error going “up” is max – Q2. Compute these five quantities as rows and you’ll have this:
Step 3: Create a stacked column chart.
Go to the INSERT tab and select a stacked column chart:
Now right click on the blank chart, choose Select Data Range and select the “box lo, box mid, mix hi range” as your data:
Step 4: Make the chart look like a Box Plot.
This is simple: the bottom bar (the blue ones in my example) need to go away. So right click on a blue bar and change both the outline and fill to nothing.
Step 5: Add Whiskers.
Follow the steps in my celebrated “Add Error Bars” post. Click on the “+” next to the chart, select Error Bars. Choose Series 2 (which corresponds to the median). Click on “More options” in the Error Bars flyout menu next to the “+”. In the task pane on the right, for Error Amount choose Custom and then click the Specify Value button:
For “Positive Error Value” select the “err up” row and for “Negative Error Value” select “err down”. Both rows contain positive values, and that is totally fine. Here’s what mine looks like:
That’s it! You can of course customize the other bars as desired.