Logo

Generalized Linear Models

Generalized linear models currently supports estimation using the one-parameter exponential families.

See Module Reference for commands and arguments.

Examples

Detailed examples can be found here:

Technical Documentation

The statistical model for each observation i is assumed to be

Y_i \sim F_{EDM}(\cdot|\theta,\phi,w_i) and \mu_i = E[Y_i|x_i] = g^{-1}(x_i^\prime\beta).

where g is the link function and F_{EDM}(\cdot|\theta,\phi,w) is a distribution of the family of exponential dispersion models (EDM) with natural parameter \theta, scale parameter \phi and weight w. Its density is given by

f_{EDM}(y|\theta,\phi,w) = c(y,\phi,w) \exp\left(\frac{y\theta-b(\theta)}{\phi}w\right)\,.

It follows that \mu = b'(\theta) and Var[Y|x]=\frac{\phi}{w}b''(\theta). The inverse of the first equation gives the natural parameter as a function of the expected value \theta(\mu) such that

Var[Y_i|x_i] = \frac{\phi}{w_i} v(\mu_i)

with v(\mu) = b''(\theta(\mu)). Therefore it is said that a GLM is determined by link function g and variance function v(\mu) alone (and x of course).

Note that while \phi is the same for every observation y_i and therefore does not influence the estimation of \beta, the weights w_i might be different for every y_i such that the estimation of \beta depends on them.

Distribution Domain \mu=E[Y|x] v(\mu) \theta(\mu) b(\theta) \phi
Binomial B(n,p) 0,1,\ldots,n np \mu-\frac{\mu^2}{n} \log\frac{p}{1-p} n\log(1+e^\theta) 1
Poisson P(\mu) 0,1,\ldots,\infty \mu \mu \log(\mu) e^\theta 1
Neg. Binom. NB(\mu,\alpha) 0,1,\ldots,\infty \mu \mu+\alpha\mu^2 \log(\frac{\alpha\mu}{1+\alpha\mu}) -\frac{1}{\alpha}\log(1-\alpha e^\theta) 1
Gaussian/Normal N(\mu,\sigma^2) (-\infty,\infty) \mu 1 \mu \frac{1}{2}\theta^2 \sigma^2
Gamma N(\mu,\nu) (0,\infty) \mu \mu^2 -\frac{1}{\mu} -\log(-\theta) \frac{1}{\nu}
Inv. Gauss. IG(\mu,\sigma^2) (0,\infty) \mu \mu^3 -\frac{1}{2\mu^2} -\sqrt{-2\theta} \sigma^2
Tweedie p\geq 1 depends on p \mu \mu^p \frac{\mu^{1-p}}{1-p} \frac{\alpha-1}{\alpha}\left(\frac{\theta}{\alpha-1}\right)^{\alpha} \phi

The Tweedie distribution has special cases for p=0,1,2 not listed in the table and uses \alpha=\frac{p-2}{p-1}.

Correspondence of mathematical variables to code:

  • Y and y are coded as endog, the variable one wants to model
  • x is coded as exog, the covariates alias explanatory variables
  • \beta is coded as params, the parameters one wants to estimate
  • \mu is coded as mu, the expectation (conditional on x) of Y
  • g is coded as link argument to the class Family
  • \phi is coded as scale, the dispersion parameter of the EDM
  • w is not yet supported (i.e. w=1), in the future it might be var_weights
  • p is coded as var_power for the power of the variance function v(\mu) of the Tweedie distribution, see table
  • \alpha is either
    • Negative Binomial: the ancillary parameter alpha, see table
    • Tweedie: an abbreviation for \frac{p-2}{p-1} of the power p of the variance function, see table

References

  • Gill, Jeff. 2000. Generalized Linear Models: A Unified Approach. SAGE QASS Series.
  • Green, PJ. 1984. “Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives.” Journal of the Royal Statistical Society, Series B, 46, 149-192.
  • Hardin, J.W. and Hilbe, J.M. 2007. “Generalized Linear Models and Extensions.” 2nd ed. Stata Press, College Station, TX.
  • McCullagh, P. and Nelder, J.A. 1989. “Generalized Linear Models.” 2nd ed. Chapman & Hall, Boca Rotan.

Module Reference

Model Class

GLM(endog, exog[, family, offset, exposure, ...]) Generalized Linear Models class

Results Class

GLMResults(model, params, ...[, cov_type, ...]) Class to contain GLM results.

Families

The distribution families currently implemented are

Family(link, variance) The parent class for one-parameter exponential families.
Binomial([link]) Binomial exponential family distribution.
Gamma([link]) Gamma exponential family distribution.
Gaussian([link]) Gaussian exponential family distribution.
InverseGaussian([link]) InverseGaussian exponential family.
NegativeBinomial([link, alpha]) Negative Binomial exponential family.
Poisson([link]) Poisson exponential family.
Tweedie([link, var_power, link_power]) Tweedie family.