Confidence Interval for Two Independent Proportions

Confidence interval for two independent proportions gives an estimated range of values, which is likely to include the difference between the two true population proportions for two categorical variables. For example, medical researchers may want to compare the proportions of men and women who suffered from heart attacks. The procedure for obtaining such an interval is based on the sample proportions, p_1 and p_2, from their respective overall populations. The width of the confidence interval gives us some idea about how uncertain we are about the difference in the proportions. A very wide interval may indicate that more data should be collected before anything very definite can be said about the parameter. The intervals are usually calculated for different confidence levels (most common are 95%, 99% and 90%), depending on how precise the researcher wants to be. Calculation of the confidence interval consists of three steps: conditions, calculation and conclusion.

1. Conditions

  • Randomization Condition: the data in each group should be drawn independently and at random from a homogenous population or generated by a randomized comparative experiment
  • 10% Condition: if the data in each group are sampled without replacement, the samples should not exceed 10% of the population of the respective group
  • Independent Groups Assumption: the two groups must be independent of each other
  • Success/Failure Condition: both groups should be large enough that there are at least 10 successes and 10 failures observed in each group. Since we don’t know the true population proportions, we use the actual number of successes and failures measured in our sample, which is why the result is expressed in whole numbers. This condition is calculated using the following formulas:

n_1 \hat{p}_1 \geq 10          n_2 \hat{p}_2 \geq 10

n_1 \hat{q}_1 \geq 10          n_2 \hat{q}_2 \geq 10,

where \hat{q}_1 = 1- \hat{p}_1 and \hat{q}_2 = 1- \hat{p}_2

2. Calculation


The critical z-score depends on the specified confidence level:

  • 90%     z = 1.645
  • 95%     z = 1.96
  • 99%     z = 2.576

3. Conclusion

We interpret an interval calculated at a 95% confidence level as, we are 95% confident that the interval contains the true difference between the two population proportions and we obtained this result by subtracting one from another (to specify the direction).

To get a gist of all different confidence intervals and hypothesis test, click here for a Statistics Glossary. This PowerPoint presentation presents a detailed description of the steps for this confidence interval, including the derivation of the interval as well as an example in the end. In the conclusion for this example, we may also add that the result was obtained by subtracting population proportion of women who exercise from that of men. Finally, click here to find an online calculator for a 95% confidence interval between two population proportions (use the interval with no continuity correction). If you are interested in the continuity correction, click here to access the paper by R. Newcombe.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s