Which GLM(M) to use for proportional data? [migrated]

If you know proportions and their denominators (in your case, "number of a certain type of hatches per number of pupae"), then a binomial response is (in my opinion) the most principled/sensible thing to do.

  • Lots of zeros are expected when the mean proportion is low; it's still possible that you need a zero-inflated binomial model, but unlikely (Warton 2005).
  • When the mean of a binomial is low, a Poisson model with an offset gives nearly identical results (see here; more precisely, the probability should be low everywhere (e.g. if you have a few combinations of covariates that lead to higher probabilities, that could mess things up).
  • As always you should check for overdispersion after fitting the model and if necessary do something appropriate (quasi-likelihood, observation-level random effects, beta-binomial model ...)

Warton, David I. “Many Zeros Do Not Mean Zero Inflation: Comparing the Goodness-of-FIt of Parametric Models to Multivariate Abundance Data.” Environmetrics 16 (2005): 275–89. https://doi.org/10.1002/env.702.