A box plot or box and whisker plot can be created using Excel - you just need to know a couple of tricks. First, the box can be created using stacked column charts. Second, the whisker can be created using y-error bars. Excel provides built-in functions that you will need to calculate the quartiles used for the "box" part of the box and whisker plot. You can jump right in by downloading the free Box Plot Template below, but I also suggest you read through the information below which describes how the functions and calculations used to create a box and whisker plot.

Update 7/2/15: A Box and Whisker chart is one of the new built-in chart types in Excel 2016! (Read about it).

categories: Statistics

Box Plot Template

for Excel
Screenshots: 1 2
For: Excel 2007 or later & Excel for iPad/iPhone

Template Details

View Agreement
(not for distribution or resale)

"No installation, no macros - just a simple spreadsheet" - by

Description

Create a box plot quickly and easily. Enter your data into the Data sheet and the chart in the Plot worksheet will update automatically.

Limitation: This template shows only the maximum or minimum outliers, if there are any. Normal convention for box plots is to show all outliers. To show all outliers, try Jon Peltier's Chart Utility add-in.

Regarding Negative Values: Using bar charts do display the interquartile range limits the technique described below to displaying positive values (or at least Q1 must be positive). There are a couple ways around this problem and both of these alternate methods are included as additional worksheets in the file. (1) You can shift the data so that it is positive before creating the box plot and (2) you can avoid the use of bar charts and display Q1, Q3, and the Median using series markers instead.

Creating a Box and Whisker Plot

Box plots are very useful data visualization tools for depicting a number of different summary statistics and especially for graphically comparing multiple data sets. It is much easier to create these plots in Excel if you know how to structure your data. You can take a look at the template as an example.

Fig 1. An example box and whisker plot from the Box Plot Template showing the IQR, whiskers, and max/min outliers.

Creating the Box

The box part of a box and whisker plot represents the central 50% of the data or the Interquartile Range (IQR). The lower edge of the box plot is the first quartile or 25th percentile. The upper edge of the box plot is the third quartile or 75th percentile. You may want to check out my article on percentiles for more details about how percentiles are calculated.

• Find the first quartile, Q1, using =QUARTILE(range,1) or =PERCENTILE(range,0.25)
• Find the median, Q2, using =MEDIAN(range) or QUARTILE(range,2) or =PERCENTILE(range,0.5)
• Find the third quartile, Q3, using =QUARTILE(range,3) or =PERCENTILE(range,0.75)
• Calculate the interquartile range (IQR) as Q3-Q1

The location of the median line relative to the first and third quartiles indicates the amount of skewness or asymmetry in the data. If the distribution is symmetric, the median will be exactly in the middle. if the median is closer to Q3, the distribution is negatively skewed (or "skewed to the left" meaning the left tail of the distribution is longer). If the median is closer to Q1, the distribution is positively skewed.

The plot in Excel is created using a stacked column chart with 3 series. The first series (bottom column) is Q1 and the border and area properties are set to none so that the column is not visible in the chart. The second series is Q2-Q1. The third series is Q3-Q2. These two series, stacked together make up the interquartile range. The area property is set to none for these two series to create just the outline for the box.

Creating the Whisker

The whiskers in a plot represent the tails of the distribution. The whiskers can be created using error bars in Excel. Because of the ease of calculation, the convention for the length of the whisker that I have used in the box plot template comes from [1]:

• The upper whisker starts at Q3 and extends upward to Q3+1.5(IQR) or the maximum value, whichever is lower.
• The lower whisker starts at Q1 and extends downward to Q1-1.5(IQR) or the minimum value, whichever is greater.

Another common convention is that instead of extending the whisker to a calculated value of Q3+1.5(IQR), the whisker is extended to the last data point that is less than or equal to Q3+1.5(IQR), and similarly with the lower whisker.

In the box plot template, the whiskers are created by adding Y-error bars to series 1 (Q1) and series 3 (Q3-Q2).

Outliers

The biggest problem with creating a box and whisker plot in Excel is how to show the outliers - the points that fall outside of the range depicted by the box and whiskers. The reason this is a problem is because you don't necessarily know how many outliers there will be. Therefore, instead of showing a point for each outlier, the box plot template above plots only the max and min values if they are outliers. For reference, the number of upper and lower outliers is given in the table to indicate if there are more outliers than just the max or min.

Variations

I haven't found a good way to create the following box plot variations in Excel, so I won't describe them in detail, but I've included them here for your information. Most good statistical software like MINITAB® will be able to include these additional features.

Variable-Width Box Plot

When you are comparing samples with different sample sizes, it is common to represent the sample size by the width of the box.

Notched Box Plot

A notched box plot is used to show the length of the confidence interval for the median.

• Pareto Chart - This template helps you perform a pareto analysis to analyze most significant factors.
• Control Chart - Create an X-bar, R or S Chart for process control.
• Dot Plot - How to create a dot plot in a spreadsheet without a chart object.

References

• Box Plots at wikipedia.com - This is actually a really good article.
• [1] A. Mitra, Fundamentals of Quality Control and Improvement, 2nd ed., Prentice Hall: New Jersey, 1998.
• [2] S. B. Vardeman, Statistics for Engineering Problem Solving, PWS Publishing Company: Boston, 1994.