Are there two formulas under the null hypothesis for the test statistics of the comparison of proportions of two samples?

When comparing two proportions of two independant samples of size $n_A$ and $n_B$ (with random variables $X_A^i$ and $X_B^i$ following a Bernoulli law $B(p_A)$ and $B(p_B)$) using a test statistics, and dealing with the null hypothesis $H_0$ that $p_A=p_B$,

I could see in the litterature these two distinct formulas for the behaviour of the test statistic under $H_0$.

Some use :

  • $T=\frac{\bar{X}_A-\bar{X}_B}{\sqrt{p_A(1-p_A)/n_A + p_B(1-p_B)/n_B}}\rightarrow N(0,1)$

  • $T=\frac{\bar{X}_A-\bar{X}_B}{\sqrt{p(1-p)(1/n_A + 1/n_B)}}\rightarrow N(0,1)$, with $p=\frac{n_A\bar{X}_A+n_B\bar{X}_B}{n_A+n_B}$

  1. Is one of them correct ? Is one of them wrong ?

  2. Now, if we consider the second formula : $T=\frac{\bar{X}_A-\bar{X}_B}{\sqrt{p(1-p)(1/n_A + 1/n_B)}}\rightarrow N(0,1)$ :

In this context, is it true to state that "in the general case", $\bar{X}_A-\bar{X}_B\rightarrow N(p_A-p_B,p_A(1-p_A)/n_A + p_B(1-p_B)/n_B)$

(it looks incompatible to me with the formula of $T$ for the variance part, but maybe there is a trick ?)

Any comment ?


Yes, at least two. As you indicate, the two main versions involve:

(a) Using the null hypothesis, thus assuming proportions are equal, and pooling all of the data in both groups to get a single standard error for the test statistic.

(b) Using two separate estimates of variance, one from each sample, to get the standard error.

One advantage of this method is that test results and confidence intervals agree. (That is, the CI for difference in proportions includes $0$ exactly when $H_0$ is not rejected.)

Other differences center on various methods of continuity correction. IMHO continuity correction almost always leads to too 'conservative' a test (too reluctant to reject $H_0,$ thus reduced power), unless sample sizes are very small (i.e, especially below 100). Also, various software implementations have slightly different rounding conventions. (In computing by hand, premature intermediate rounding can result in surprisingly large errors.)

Another similar method is a chi-squared test on a $2\times 2$ contingency table with rows Yes and No and columns Gp1 and Gp2.

I have found the procedure prop.test in R (which is roughly equivalent to a chi-squared test) to be a convenient and reliable method (using parameter cor=F to suppress continuity correction). If you are given the option to estimate variances separately (not to pool), do so. Also, if you are given the option to omit Yates correction, do so for samples of more than 100.

As an example, suppose Gp 1 has 55 Successes in 150, and Gp 2 has 71 Successes in 200. Then 'prop.test' in R gives the following output.

prop.test(c(55,71), c(150,200), cor=F)

        2-sample test for equality of proportions 
        without continuity correction

data:  c(55, 71) out of c(150, 200)
X-squared = 0.050637, df = 1, p-value = 0.822
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.09004438  0.11337772
sample estimates:
   prop 1    prop 2 
0.3666667 0.3550000 

The $2\times 2$ table for chisq.test in R, gives the following output.

succ = c(55, 71);  tot = c(150,200)
fail = tot - succ
TBL = rbind(succ, fail);  TAB
     [,1] [,2]
succ   55   71
fail   95  129

chisq.test(TBL, cor=F)

        Pearson's Chi-squared test

data:  TBL
X-squared = 0.050637, df = 1, p-value = 0.822

Notes:

  • The P-value is the same for both procedures is the same. The null hypothesis that the proportions are equal is not rejected at the 5% level

  • The CI in 'prop.test' includes $0.$

  • The prop.test permits one-sided tests (by use of parameters alt="gr" or alt="less"), while the chi-squared test is inherently two-sided.

  • If counts are too small, computer software may warn that the test statistic does not have the expected distribution, so that the displayed P-value may not be trustworthy. In that case, one can use Fisher's Exact Test (fisher.test in R) instead of chisq.test. Another alternative is to use simulation (with parameter 'sim=T`) to simulate a more accurate P-value.