A box plot, also known as a “box-and-whisker plot,” is a graph used to visually convey the distribution and variation of a single quantitative variable of a set of data on an interval. A box plot prominently features the quartiles, range and outliers of a set of data and can be drawn vertically or horizontally. The top and bottom/left and right of the box are the first and third quartiles respectively (expressed as the 25th and 75th percentiles in the image to the right). The line through the box is the second quartile, also known as the “median.” The “whiskers” extending from either end of the box extend to the minimum and maximum of the set of data. Outliers are represented by single points.

The benefits of box plots are:

  • Large data – The clarity of a box plot’s information is not lost with larger sets of data.
  • Variation– Gives a simplified visual representation of the variation of a set of data
  • Outliers – Clearly shows the outliers of a set of data

Wikipedia has a simple overview of box plots along with summaries of many variations and alternate forms of box plots. A detailed guide to making a box plot can be found here.

It should be noted that, while a box plot conveniently summarizes a set of data, most exact values are lost. A box plot is unsuitable when analyzing a set of data in great detail. Also, if data is more highly concentrated, then outliers are more likely to be shown on a box plot. It is important to know what is an outlier and what is the result of highly concentrated data, as discussed in the article “How Significant Is a Boxplot Outlier?”


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s