# Box and Whisker Plot Template

A **box plot** or **box and whisker plot** can be created using Excel - you just need to know a couple of tricks. First, the *box* can be created using stacked column charts. Second, the *whisker* can be created using y-error bars. Excel provides built-in functions that you will need to calculate the quartiles used for the "box" part of the box and whisker plot. You can jump right in by downloading the free **Box Plot Template** below, but I also suggest you read through the information below which describes how the functions and calculations used to create a box and whisker plot.

## Box Plot Template

for Excel**Excel 2003 or later**

#### Template Details

(not for distribution or resale)

*"No installation, no macros - just a simple spreadsheet" - by Jon Wittwer*

### Description

**Create a box plot quickly and easily**. Enter your data into the Data sheet and the chart in the Plot worksheet will update automatically.

**Limitation**: This template shows only the maximum or minimum outliers, if there are any. Normal convention for box plots is to show all outliers. To show all outliers, try Jon Peltier's Chart Utility add-in.

**Regarding Negative Values:** Using bar charts do display the interquartile range limits the technique described below to displaying positive values (or at least Q1 must be positive). There are a couple ways around this problem and both of these alternate methods are included as additional worksheets in the file. (1) You can shift the data so that it is positive before creating the box plot and (2) you can avoid the use of bar charts and display Q1, Q3, and the Median using series markers instead.

## Creating a Box and Whisker Plot

Box plots are very useful data visualization tools for depicting a number of different summary statistics and especially for graphically comparing multiple data sets. It is much easier to create these plots in Excel if you know how to structure your data. You can take a look at the template as an example.

### Creating the Box

The box part of a box and whisker plot represents the central 50% of the data or the **Interquartile Range** (IQR). The lower edge of the box plot is the **first quartile** or 25th percentile. The upper edge of the box plot is the **third quartile** or 75th percentile. You may want to check out my article on percentiles for more details about how percentiles are calculated.

- Find the
*first quartile*, Q_{1}, using =QUARTILE(*range*,1) or =PERCENTILE(*range*,0.25) - Find the
*median*, Q_{2}, using =MEDIAN(*range*) or QUARTILE(*range*,2) or =PERCENTILE(*range*,0.5) - Find the
*third quartile*, Q_{3}, using =QUARTILE(*range*,3) or =PERCENTILE(*range*,0.75) - Calculate the
*interquartile range*(IQR) as Q_{3}-Q_{1}

The location of the median line relative to the first and third quartiles indicates the amount of **skewness** or **asymmetry** in the data. If the distribution is symmetric, the median will be exactly in the middle. if the median is closer to Q3, the distribution is negatively skewed (or "skewed to the left" meaning the left tail of the distribution is longer). If the median is closer to Q1, the distribution is positively skewed.

The plot in Excel is created using a stacked column chart with 3 series. The first series (bottom column) is Q1 and the *border* and *area* properties are set to *none* so that the column is not visible in the chart. The second series is Q2-Q1. The third series is Q3-Q2. These two series, stacked together make up the interquartile range. The *area* property is set to none for these two series to create just the outline for the box.

### Creating the Whisker

The **whiskers** in a plot represent the tails of the distribution. The whiskers can be created using error bars in Excel. Because of the ease of calculation, the convention for the length of the whisker that I have used in the box plot template comes from [1]:

- The upper whisker starts at Q
_{3}and extends upward to Q_{3}+1.5(IQR) or the maximum value, whichever is lower. - The lower whisker starts at Q
_{1}and extends downward to Q_{1}-1.5(IQR) or the minimum value, whichever is greater.

Another common convention is that instead of extending the whisker to a calculated value of Q_{3}+1.5(IQR), the whisker is extended to the *last data point that is less than or equal to* Q_{3}+1.5(IQR), and similarly with the lower whisker.

In the box plot template, the whiskers are created by adding Y-error bars to series 1 (Q1) and series 3 (Q3-Q2).

### Outliers

The biggest problem with creating a box and whisker plot in Excel is how to show the **outliers** - the points that fall outside of the range depicted by the box and whiskers. The reason this is a problem is because you don't necessarily know how many outliers there will be. Therefore, instead of showing a point for each outlier, the box plot template above plots only the max and min values if they are outliers. For reference, the number of upper and lower outliers is given in the table to indicate if there are more outliers than just the max or min.

## Variations

I haven't found a good way to create the following box plot variations in Excel, so I won't describe them in detail, but I've included them here for your information. Most good statistical software like MINITAB® will be able to include these additional features.

### Variable-Width Box Plot

### Notched Box Plot

**notched box plot**is used to show the length of the confidence interval for the median.

## Other Statistics Spreadsheets

- Pareto Chart - This template helps you perform a pareto analysis to analyze most significant factors.
- Control Chart - Create an X-bar, R or S Chart for process control.
- Dot Plot - How to create a dot plot in a spreadsheet without a chart object.

### References

- Box Plots at
*wikipedia.com*- This is actually a really good article. - [1] A. Mitra,
*Fundamentals of Quality Control and Improvement*, 2nd ed., Prentice Hall: New Jersey, 1998. - [2] S. B. Vardeman,
*Statistics for Engineering Problem Solving*, PWS Publishing Company: Boston, 1994.