How to perform two-sample one-tailed t-test with numpy/scipy
In R
, it is possible to perform two-sample one-tailed t-test simply by using
> A = c(0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846)
> B = c(0.6383447, 0.5271385, 1.7721380, 1.7817880)
> t.test(A, B, alternative="greater")
Welch Two Sample t-test
data: A and B
t = -0.4189, df = 6.409, p-value = 0.6555
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-1.029916 Inf
sample estimates:
mean of x mean of y
0.9954942 1.1798523
In Python world, scipy
provides similar function ttest_ind, but which can only do two-tailed t-tests. Closest information on the topic I found is this link, but it seems to be rather a discussion of the policy of implementing one-tailed vs two-tailed in scipy
.
Therefore, my question is that does anyone know any examples or instructions on how to perform one-tailed version of the test using numpy/scipy
?
Solution 1:
From your mailing list link:
because the one-sided tests can be backed out from the two-sided tests. (With symmetric distributions one-sided p-value is just half of the two-sided pvalue)
It goes on to say that scipy always gives the test statistic as signed. This means that given p and t values from a two-tailed test, you would reject the null hypothesis of a greater-than test when p/2 < alpha and t > 0
, and of a less-than test when p/2 < alpha and t < 0
.
Solution 2:
After trying to add some insights as comments to the accepted answer but not being able to properly write them down due to general restrictions upon comments, I decided to put my two cents in as a full answer.
First let's formulate our investigative question properly. The data we are investigating is
A = np.array([0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846])
B = np.array([0.6383447, 0.5271385, 1.7721380, 1.7817880])
with the sample means
A.mean() = 0.99549419
B.mean() = 1.1798523
I assume that since the mean of B is obviously greater than the mean of A, you would like to check if this result is statistically significant.
So we have the Null Hypothesis
H0: A >= B
that we would like to reject in favor of the Alternative Hypothesis
H1: B > A
Now when you call scipy.stats.ttest_ind(x, y)
, this makes a Hypothesis Test on the value of x.mean()-y.mean()
, which means that in order to get positive values throughout the calculation (which simplifies all considerations) we have to call
stats.ttest_ind(B,A)
instead of stats.ttest_ind(B,A)
. We get as an answer
t-value = 0.42210654140239207
p-value = 0.68406235191764142
and since according to the documentation this is the output for a two-tailed t-test we must divide the p
by 2 for our one-tailed test. So depending on the Significance Level alpha
you have chosen you need
p/2 < alpha
in order to reject the Null Hypothesis H0
. For alpha=0.05
this is clearly not the case so you cannot reject H0
.
An alternative way to decide if you reject H0
without having to do any algebra on t
or p
is by looking at the t-value and comparing it with the critical t-value t_crit
at the desired level of confidence (e.g. 95%) for the number of degrees of freedom df
that applies to your problem. Since we have
df = sample_size_1 + sample_size_2 - 2 = 8
we get from a statistical table like this one that
t_crit(df=8, confidence_level=95%) = 1.860
We clearly have
t < t_crit
so we obtain again the same result, namely that we cannot reject H0
.
Solution 3:
When null hypothesis is Ho: P1>=P2
and alternative hypothesis is Ha: P1<P2
. In order to test it in Python, you write ttest_ind(P2,P1)
. (Notice the position is P2 first).
first = np.random.normal(3,2,400)
second = np.random.normal(6,2,400)
stats.ttest_ind(first, second, axis=0, equal_var=True)
You will get the result like below
Ttest_indResult(statistic=-20.442436213923845,pvalue=5.0999336686332285e-75)
In Python, when statstic <0
your real p-value is actually real_pvalue = 1-output_pvalue/2= 1-5.0999336686332285e-75/2
, which is approximately 0.99. As your p-value is larger than 0.05, you cannot reject the null hypothesis that 6>=3. when statstic >0
, the real z score is actually equal to -statstic
, the real p-value is equal to pvalue/2.
Ivc's answer should be when (1-p/2) < alpha and t < 0
, you can reject the less than hypothesis.
Solution 4:
from scipy.stats import ttest_ind
def t_test(x,y,alternative='both-sided'):
_, double_p = ttest_ind(x,y,equal_var = False)
if alternative == 'both-sided':
pval = double_p
elif alternative == 'greater':
if np.mean(x) > np.mean(y):
pval = double_p/2.
else:
pval = 1.0 - double_p/2.
elif alternative == 'less':
if np.mean(x) < np.mean(y):
pval = double_p/2.
else:
pval = 1.0 - double_p/2.
return pval
A = [0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846]
B = [0.6383447, 0.5271385, 1.7721380, 1.7817880]
print(t_test(A,B,alternative='greater'))
0.6555098817758839