Solution 1:

Several important things are not clear to me from your statement of the Question. So I will make some assumptions that I hope will be helpful toward a solution. Even if not exactly correct, my assumptions may point you in the right direction. At the very least, I hope my Answer gives you the opportunity to explain your situation more fully.

  • In what follows I am relying heavily on the following sentence in your Question "I wanted to estimate the study size required to have an 75 % probability of detecting such a difference between, at a 1 % significance level."

  • Not so much on your attempted answer.

It seems you want a 'power and sample size' computation for a paired t test (one-sample t test on differences) and that you are testing $H_0: \delta = 0$ against $H_a: \delta > 0,$ where $\delta$ is the average decrease in hours of sleep in the population from drinking coffee.

You assume the population SD is $\sigma = 3.5,$ you want to reject $H_0$ against specific alternative $H_a: \delta = 1.5$ in a test at the 1% level. The question is in how many subjects $n$ do you need to find the loss of sleep $d$d from drinking coffee.

For clarity, let's look at one fictitious sample of $n = 40$ differences $d_i,$ as follows:

d = rnorm(40, 1.4, 3.5)
summary(d)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
 6.351450  0.001869  1.851865  1.722092  4.005881  6.983483 
 length(d);  sd(d)
[1] 40            # sample size
[1] 3.103335      # sample SD

Then the proposed t test (in R) gives the following output. Because the P-value is $0.0006 > 0.01 = 1\%$ the test rejects at the 1% level.

t.test(d, mu = 0, alt="gr")

        One Sample t-test

data:  d
t = 3.5096, df = 39, p-value = 0.0005744
alternative hypothesis: 
  true mean is greater than 0
95 percent confidence interval:
  0.8953566       Inf
sample estimates:
mean of x 
 1.722092 

Of course, this one fictitious dataset cannot provide evidence that $n = 40$ is enough subjects. For one thing, our data have sample SD about $3.1,$ which happens to be smaller than the assumed population $\sigma = 3.5.$

However, one can look at 100,000 such datasets to see what rejection rate we get. Is it the required 75%? A little experimentation shows that to get a power of 75% we need about $n = 52.$

set.seed(2022)
pv = replicate(10^5, t.test(rnorm(52,1.5, 3.5), 
                mu=0, alt="gr")$p.val)
mean(pv <= .01)
[1] 0.75495

Notes: (1) The numerical vector pv contains P-values from 100,000 P-values, the logical vector pv <= .01 contains 100,000 TRUEs and FALSEs, and its mean is its proportion of TRUEs (rejections).

(2) Many mathematical statistics texts give an exact formula, involving non-central t distributions, to find power for a given sample size. Also, there are online calculators and software packages that do exact 'power and sample size' computations.

(3) Your attempted Answer seems to use the formula for a confidence interval, I don't see how that leads to what you say you want.

(4) Usually coffee leads to loss of sleep, so I think you need a one-sided procedure. (A two-sided test would require more observations than a one-sided test.)