A side-by-side box plot is a useful tool for visually comparing two data sets. Box plots work well on large data sets that are too disorderly to be displayed using other plots, but they may be also used on neat data sets. Side-by-side box plots present all of the information that box plots do for each instance of a categorical variable. Box plots summarize the data in five different numbers:
- The Median
- The Lower Quartile
- The Upper Quartile
- The Minimum
- The Maximum
Side-by-side box plots are useful in comparing fundamental information about two data sets, such as the median values and the range of values covered by the data. Side-by-side box plots provide a targeted summary and analysis of data. It is important to note, however, that one disadvantage of box plots is that they tend to obscure some of the details of the data, such as some information about the shape of the distribution of the data. For example, side-by-side box plots may not work well with data that is skewed or bimodal.
On their own, boxplots are only able to deal with one quantitative variable. However, side-by-side box plots can be applied to data sets with one quantitative and one categorical variable, which makes them especially useful for many real-world statistical problems.
A general overview of some of the advantages of using side-by-side box plots, called parallel boxplots here, can be found here under the section “Parallel box-plots”.
Section D on this page gives good information on how visually to compare side-by-side boxplots.
In some variations of the box plot, the width of the boxes indicate the number of instances in each category, as show here under “Variations on Box Plots”.