Confidence Interval for Two Independent Proportions

A confidence interval for two independent proportions is a tool used in inferential statistics to estimate a proportional* difference between two independent populations. These two populations must be distinguished by a single binary categorical variable; e.g. sex (male or female), but not blood type (A, B, AB, or O). With this information, it can be seen that this type of confidence interval is used to make estimates about two (binary) categorical variables.

The calculation for a confidence interval for two independent proportions is fairly straightforward with a little hiccup at the standard error (standard deviation of the sampling distribution). The equation is:

CI = \hat{p}_1 - \hat{p}_2 \pm z^*\sqrt{\frac{ \hat{p}_1\hat{q}_1 }{ n_1} +\frac{\hat{p}_2\hat{q}_2}{ n_2}}

To help remember the standard error calculation, variances can sum. Adding the variances together gives the combined variance and then the standard deviation is just the square root of the variance.

Last, but most important, is the proper interpretation of this type of confidence interval. A confidence interval for two independent proportions is interpreted the same way as a single proportion confidence interval, except that there is an additional factor of direction. Without specifying the direction in which the difference of the proportions was taken (\hat{p}_1-\hat{p}_2 vs. \hat{p}_2-\hat{p}_1) the interpretation is nearly worthless. Here is a good example of how to interpret this type of confidence interval. The linked example shows a situation where the confidence interval straddles a zero difference, which are particularly hard to interpret. Confidence intervals are most misused when they are not interpreted correctly. The second half of this article gives a few examples of poor interpretations.

*Remember that when we use the term proportion in statistics it implies a binary categorical variable; success/failure.

Leave a comment