Efficient computation of $E\left[\left(1+X_1+\cdots+X_n\right)^{-1}\right]$ with $(X_i)$ independent Bernoulli with varying parameter

Solution 1:

An approach is through generating functions. For every nonnegative random variable $S$, $$ E\left(\frac1{1+S}\right)=\int_0^1E(t^S)\mathrm{d}t. $$ If $S=X_1+\cdots+X_n$ and the random variables $X_i$ are independent, $E(t^S)$ is the product of the $E(t^{X_i})$. If furthermore $X_i$ is Bernoulli $p_i$, $$ E\left(\frac1{1+S}\right)=\int_0^1\prod_{i=1}^n(1-p_i+p_it)\mathrm{d}t. $$ This is an exact formula. I do not know how best to use it to compute the LHS efficiently. Of course one can develop the integrand in the RHS, getting a sum of $2^n$ terms indexed by the subsets $I$ of $\{1,2,\ldots,n\}$ as $$ E\left(\frac1{1+S}\right)=\sum_I\frac1{|I|+1}\prod_{i\in I}p_i\cdot\prod_{j\notin I}(1-p_j). $$ But it might be more useful to notice that $$ \prod_{i=1}^n(1-p_i+p_it)=\sum_{k=0}^n(-1)^k\sigma_k(\mathbf{p})(1-t)^k, $$ where $\sigma_0(\mathbf{p})=1$ and $(\sigma_k(\mathbf{p}))_{1\le k\le n}$ are the symmetric polynomials of the family $\mathbf{p}=\{p_i\}_i$. Integrating with respect to $t$, one gets $$ E\left(\frac1{1+S}\right)=\sum_{k=0}^n(-1)^k\frac{\sigma_k(\mathbf{p})}{k+1}. $$ The computational burden is reduced to the determination of the sequence $(\sigma_k(\mathbf{p}))_{1\le k\le n}$.

Note 1 The last formula is an integrated version of the algebraic identity stating that, for every family $\mathbf{x}=\{x_i\}_i$ of zeroes and ones, $$ \frac1{1+\sigma_1(\mathbf{x})}=\sum_{k\ge0}(-1)^k\frac{\sigma_k(\mathbf{x})}{k+1}, $$ truncated at $k=n$ since, when at most $n$ values of $x_i$ are non zero, $\sigma_k(\mathbf{x})=0$ for every $k\ge n+1$. To prove the algebraic identity, note that, for every $k\ge0$, $$ \sigma_1(\mathbf{x})\sigma_k(\mathbf{x})=k\sigma_k(\mathbf{x})+(k+1)\sigma_{k+1}(\mathbf{x}), $$ and compute the product of $1+\sigma_1(\mathbf{x})$ by the series in the RHS. To apply this identity to our setting, introduce $\mathbf{X}=\{X_i\}_i$ and note that, for every $k\ge0$, $$ E(\sigma_k(\mathbf{X}))=\sigma_k(\mathbf{p}). $$ Note 2 More generally, for every suitable complex number $z$, $$ \frac1{z+\sigma_1(\mathbf{x})}=\sum_{k\ge0}(-1)^k\frac{\Gamma(k+1)\Gamma(z)}{\Gamma(k+1+z)}\sigma_k(\mathbf{x}). $$

Solution 2:

Regarding approximations for large N (say, $N>10$) , though perhaps this is not you are looking for... A general approach for approximating the expectation of a function of a random variable $y=g(x)$ is to make a Taylor expansion around the mean (of $x$) and taking expectations.

$$ y = g(\mu_x) + g'(\mu_x) (x-\mu_x) + \frac{1}{2}g''(\mu_x) (x-\mu_x)^2 +\cdots \Rightarrow$$

$$ \mu_y = g(\mu_x) + \frac{1}{2!}g''(\mu_x) \; m_{2,x} + \frac{1}{3!} g'''(\mu_x) \; m_{3,x} \cdots \tag 1$$

where $m_{k,x}$ is the k-th centered moment of $x$

In our case,the second order approximation (taking the first two terms in $(1)$) gives us:

$$ E(y)= E \left[ \frac{1}{1+\sum x} \right] \approx y_0 + y_0^3 \sum \sigma^2_i$$

where $$ y_0 = \frac{1}{1 + \sum E(x_i)} = \frac{1}{1 + \sum p_i}$$

(that would be the first order aproximation) and

$$\sigma^2_i = p_i (1-p_i)$$

Experimentally, I get a relative error that rarely exceeds 0.01, with random (uniform) $p_i$ and $N=15$. With $N=30$, it's about 0.001