# Confidence Interval for Two Independent Means

A confidence interval for two independent means is calculated to likely contain the difference between two true population means for one binary categorical variable and one quantitative variable. For example, a company might be considering replacing some of its packaging workers with machines, so they will compare the mean amount of time it takes a human worker to package an item and the mean amount of time it takes a machine to package an item. After preforming a $t$-test, a confidence interval can be calculated. This confidence interval is used to estimate the true difference between the two population means, if those true means were known (so in the example, the difference between the true mean packaging time for the human worker and the true mean packaging time for the machine). Typically, confidence intervals are calculated for confidence levels 90%, 95% or 99%. If we calculate at a 95% confidence level, for example, then we are 95% sure that, if the true means were known, then the difference between the true population means would be within our interval. There are three steps to finding a confidence interval for two independent means:

1. Check conditions.
• Random/10%: Check that both sets of data were collected randomly and are independent of other data. Also check that both samples do not make up more than 10% of their respective populations.
• Check one of the following:
• Nearly normal: Check that both sets of data appear to be normally distributed using histograms or QQ plots.
• Check that both sample sizes are greater than 40.
2. Calculate the interval:
• $\left((\bar{y_1}-\bar{y_2})-t_{df}^*\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}, (\bar{y_1}-\bar{y_2})+t_{df}^*\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\right)$
• In this situation, $df$‘s (degrees of freedom) calculation is complicated, so we usually just use a computer.
3. Conclusion: Specify the direction/use in a contextual, full sentence.

It is important not to misinterpret the purpose or definition of a confidence interval. Another way to interpret a 90% confidence interval is: If we repeated the procedure with different samples, 9 out of 10 of them would result in confidence intervals that contained the difference of the true means, if those true means were known. A confidence interval does not predict future confidence intervals. It is also not an estimation of where the difference between the means of different samples most often lies, nor does it say anything about specific cases within the samples/population.

The informative leaflet “Common errors in the interpretation of survey data,” distributed by the State of Queensland’s Office of Economic and Statistical Research, covers some other ways that confidence intervals are misused. Wikipedia discusses alternatives to confidence intervals, which can be what is truly desired when a confidence interval is misused. For more tips on using confidence intervals, see the University of Minnesota’s page, “List of Tools for Confidence Intervals.”

Advertisements