# Confidence interval for two independent means

A confidence interval for two independent means is a method of statistical inference to determine if there is a difference between the mean values of two populations, or what the value of the difference may be. Since it compares two populations, there must be a (binary) categorical variable involved in the global population by which to separate each sample case (and, indeed, each member of the global population). A quantitative variable is then observed in each of the two populations, and the confidence interval will capture (within a certain confidence) the difference of the two means.

As always, there are conditions that must be met for the inference to be valid. In this case, we must meet the random condition, 10% condition, and “nearly normal” condition. All of these conditions must be met on both samples.

The confidence interval itself is defined as:

$\bigg((\bar{y_1}-\bar{y_2}) - t_{df}^{*}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}},(\bar{y_1}-\bar{y_2}) + t_{df}^{*}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\bigg)$,

where $\bar{y_1}$, $\bar{y_2}$, $s_1$, $s_2$$n_1$, and $n_2$ are derived easily from the two samples gathered. $t_{df}^{*}$ is best computed by SPSS (even the number of degrees of freedom is determined only by a complicated formula).

The conclusion should be stated carefully to avoid confusion. A template for the conclusion may be: “We are ___% confident that the value by which [the mean of the first population] is greater than [the mean of the second population] is captured in this interval.” Special attention should be given to which mean is the “first” and which is the “second” to help ensure it is obvious which mean is greater (in a given data set, which sample is greater is almost certainly an important research question or at least a key observation).