A box plot, also known as a “box-and-whisker plot,” is a graph used to visually convey the distribution and variation of a single quantitative variable of a set of data on an interval. A box plot prominently features the quartiles, range and outliers of a set of data and can be drawn vertically or horizontally. The top and bottom/left and right of the box are the first and third quartiles respectively (expressed as the 25th and 75th percentiles in the image to the right). The line through the box is the second quartile, also known as the “median.” The “whiskers” extending from either end of the box extend to the minimum and maximum of the set of data. Outliers are represented by single points.
The benefits of box plots are:
- Large data – The clarity of a box plot’s information is not lost with larger sets of data.
- Variation– Gives a simplified visual representation of the variation of a set of data
- Outliers – Clearly shows the outliers of a set of data
It should be noted that, while a box plot conveniently summarizes a set of data, most exact values are lost. A box plot is unsuitable when analyzing a set of data in great detail. Also, if data is more highly concentrated, then outliers are more likely to be shown on a box plot. It is important to know what is an outlier and what is the result of highly concentrated data, as discussed in the article “How Significant Is a Boxplot Outlier?”