Are there two formulas under the null hypothesis for the test statistics of the comparison of proportions of two samples?
When comparing two proportions of two independant samples of size $n_A$ and $n_B$ (with random variables $X_A^i$ and $X_B^i$ following a Bernoulli law $B(p_A)$ and $B(p_B)$) using a test statistics, and dealing with the null hypothesis $H_0$ that $p_A=p_B$,
I could see in the litterature these two distinct formulas for the behaviour of the test statistic under $H_0$.
Some use :
-
$T=\frac{\bar{X}_A-\bar{X}_B}{\sqrt{p_A(1-p_A)/n_A + p_B(1-p_B)/n_B}}\rightarrow N(0,1)$
-
$T=\frac{\bar{X}_A-\bar{X}_B}{\sqrt{p(1-p)(1/n_A + 1/n_B)}}\rightarrow N(0,1)$, with $p=\frac{n_A\bar{X}_A+n_B\bar{X}_B}{n_A+n_B}$
-
Is one of them correct ? Is one of them wrong ?
-
Now, if we consider the second formula : $T=\frac{\bar{X}_A-\bar{X}_B}{\sqrt{p(1-p)(1/n_A + 1/n_B)}}\rightarrow N(0,1)$ :
In this context, is it true to state that "in the general case", $\bar{X}_A-\bar{X}_B\rightarrow N(p_A-p_B,p_A(1-p_A)/n_A + p_B(1-p_B)/n_B)$
(it looks incompatible to me with the formula of $T$ for the variance part, but maybe there is a trick ?)
Any comment ?
Yes, at least two. As you indicate, the two main versions involve:
(a) Using the null hypothesis, thus assuming proportions are equal, and pooling all of the data in both groups to get a single standard error for the test statistic.
(b) Using two separate estimates of variance, one from each sample, to get the standard error.
One advantage of this method is that test results and confidence intervals agree. (That is, the CI for difference in proportions includes $0$ exactly when $H_0$ is not rejected.)
Other differences center on various methods of continuity correction. IMHO continuity correction almost always leads to too 'conservative' a test (too reluctant to reject $H_0,$ thus reduced power), unless sample sizes are very small (i.e, especially below 100). Also, various software implementations have slightly different rounding conventions. (In computing by hand, premature intermediate rounding can result in surprisingly large errors.)
Another similar method is a chi-squared test on a $2\times 2$ contingency table with rows Yes and No and columns Gp1 and Gp2.
I have found the procedure prop.test
in R (which is roughly equivalent to a chi-squared test) to be
a convenient and reliable method (using parameter
cor=F
to suppress continuity correction). If you are given the option to estimate variances separately (not to pool), do so. Also, if you are given the option to omit Yates correction, do so
for samples of more than 100.
As an example, suppose Gp 1 has 55 Successes in 150, and Gp 2 has 71 Successes in 200. Then 'prop.test' in R gives the following output.
prop.test(c(55,71), c(150,200), cor=F)
2-sample test for equality of proportions
without continuity correction
data: c(55, 71) out of c(150, 200)
X-squared = 0.050637, df = 1, p-value = 0.822
alternative hypothesis: two.sided
95 percent confidence interval:
-0.09004438 0.11337772
sample estimates:
prop 1 prop 2
0.3666667 0.3550000
The $2\times 2$ table for chisq.test
in R, gives
the following output.
succ = c(55, 71); tot = c(150,200)
fail = tot - succ
TBL = rbind(succ, fail); TAB
[,1] [,2]
succ 55 71
fail 95 129
chisq.test(TBL, cor=F)
Pearson's Chi-squared test
data: TBL
X-squared = 0.050637, df = 1, p-value = 0.822
Notes:
-
The P-value is the same for both procedures is the same. The null hypothesis that the proportions are equal is not rejected at the 5% level
-
The CI in 'prop.test' includes $0.$
-
The
prop.test
permits one-sided tests (by use of parametersalt="gr"
oralt="less"
), while the chi-squared test is inherently two-sided. -
If counts are too small, computer software may warn that the test statistic does not have the expected distribution, so that the displayed P-value may not be trustworthy. In that case, one can use Fisher's Exact Test (
fisher.test
in R) instead ofchisq.test
. Another alternative is to use simulation (with parameter 'sim=T`) to simulate a more accurate P-value.