Solution 1:

I'm assuming that the author is using the gamma distribution as a conjugate prior for a Poisson distribution. The distribution $\Gamma(\alpha=0.001,\beta=0.001)$ does indeed have most of its mass very close to 0, but it also has an impressive tail, so in fact its mean is $1$. This observation, however, is unrelated to its vagueness. It is vague in the sense that as soon as you update it based on your first empirical observation, the posterior distribution will tell you that whatever data point you observed is a very typical one. Put another way, it reflects a belief that is very weakly held and easily molded by exposure to new information.

Let's say that you're trying to estimate the average number of calls that come into a call center per hour, modeled as a Poisson distribution with rate $\lambda$. $\Gamma(\alpha=0.001,\beta=0.001)$ reflects your prior belief about the value of $\lambda$. In your first hour of observation, $50$ calls come in, so you perform a Bayesian update and derive $\Gamma(\alpha=50.001, \beta=1.001)$ as your posterior. This posterior distribution has a mean of $\frac{50.001}{1.001} \approx 50$. So, now that you have actual data, you've almost completely thrown away your old prejudices and updated your beliefs to match your empirical observations.

It's quite common to use $\Gamma(\alpha=0,\beta=0)$ as a prior. That distribution doesn't even make mathematical sense: its PDF contains the term $0^0$ and regardless whether you decide that $0^0=0$ or $0^0=1$, the total area under the distribution curve will come out to $0$ or $\infty$ respectively: not $1$. Nonetheless, that doesn't stop us from using it as a prior: we'll get a sensible posterior as soon as we observe our first data point. A prior of this sort is called an improper prior. Some authors use $improper$ and $vague$ interchangeably.

Solution 2:

Gamma priors can also be used for the conjugate prior to the precision (inverse variance) as would be the case in a standard Bayesian regression model. In this case this prior would be considered vague where you have a reasonably number of parameters/observations that the variance parameters governs (more than 8 is usually enough for this prior to be vague). This would usually be fine for the error of your model, as the variance parameter would govern the error of $N$ observations, where $N$ would usually be large enough for the prior to effectively be vague. In this case the parameters of the conditional Gamma distribution are not really effected by the addition of 0.001 as they will be reasonably large already

However you are right in the sense that this prior is not completely vague and this is something that is more noticeable in other situations. Take a mixed effects model where you have $J$ groups and the levels are governed by a variance parameter. In this case $J$ is not necessary large and therefore the parameters of the conditional Gamma distribution are likewise small. These are therefore effected by the large mass at $0$ and the addition of 0.001 to each of the parameters of your conditional distribution.

However this isn't really a problem if you make the Gamma distribution hyperparameters small enough. It is however a problem that your posterior is now effected by the mass at $0$. In this case there becomes a tendancy for the posterior to some what match the prior. There are several solutions to this which are discussed by Gelman (2006). He also does a much better job than me of describing the problem and solution. If you want to know more about this problem then I suggest reading this paper. In practise however many statisticians will continue to use these priors because unless there is a low number of parameters/observations that the variance parameters governs, then the choice of this prior will not affect the results noticeably