Generalized linear models currently supports estimation using the one-parameter exponential families.
See Module Reference for commands and arguments.
# Load modules and data
In [1]: import statsmodels.api as sm
In [2]: data = sm.datasets.scotland.load()
In [3]: data.exog = sm.add_constant(data.exog)
# Instantiate a gamma family model with the default link function.
In [4]: gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())
In [5]: gamma_results = gamma_model.fit()
In [6]: print(gamma_results.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 32
Model: GLM Df Residuals: 24
Model Family: Gamma Df Model: 7
Link Function: inverse_power Scale: 0.00358428317349
Method: IRLS Log-Likelihood: -83.017
Date: Thu, 09 Feb 2017 Deviance: 0.087389
Time: 01:13:33 Pearson chi2: 0.0860
No. Iterations: 4
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.0178 0.011 -1.548 0.122 -0.040 0.005
x1 4.962e-05 1.62e-05 3.060 0.002 1.78e-05 8.14e-05
x2 0.0020 0.001 3.824 0.000 0.001 0.003
x3 -7.181e-05 2.71e-05 -2.648 0.008 -0.000 -1.87e-05
x4 0.0001 4.06e-05 2.757 0.006 3.23e-05 0.000
x5 -1.468e-07 1.24e-07 -1.187 0.235 -3.89e-07 9.56e-08
x6 -0.0005 0.000 -2.159 0.031 -0.001 -4.78e-05
x7 -2.427e-06 7.46e-07 -3.253 0.001 -3.89e-06 -9.65e-07
==============================================================================
Detailed examples can be found here:
The statistical model for each observation i is assumed to be
Y_i \sim F_{EDM}(\cdot|\theta,\phi,w_i) and \mu_i = E[Y_i|x_i] = g^{-1}(x_i^\prime\beta).
where g is the link function and F_{EDM}(\cdot|\theta,\phi,w) is a distribution of the family of exponential dispersion models (EDM) with natural parameter \theta, scale parameter \phi and weight w. Its density is given by
f_{EDM}(y|\theta,\phi,w) = c(y,\phi,w) \exp\left(\frac{y\theta-b(\theta)}{\phi}w\right)\,.
It follows that \mu = b'(\theta) and Var[Y|x]=\frac{\phi}{w}b''(\theta). The inverse of the first equation gives the natural parameter as a function of the expected value \theta(\mu) such that
Var[Y_i|x_i] = \frac{\phi}{w_i} v(\mu_i)
with v(\mu) = b''(\theta(\mu)). Therefore it is said that a GLM is determined by link function g and variance function v(\mu) alone (and x of course).
Note that while \phi is the same for every observation y_i and therefore does not influence the estimation of \beta, the weights w_i might be different for every y_i such that the estimation of \beta depends on them.
Distribution | Domain | \mu=E[Y|x] | v(\mu) | \theta(\mu) | b(\theta) | \phi |
---|---|---|---|---|---|---|
Binomial B(n,p) | 0,1,\ldots,n | np | \mu-\frac{\mu^2}{n} | \log\frac{p}{1-p} | n\log(1+e^\theta) | 1 |
Poisson P(\mu) | 0,1,\ldots,\infty | \mu | \mu | \log(\mu) | e^\theta | 1 |
Neg. Binom. NB(\mu,\alpha) | 0,1,\ldots,\infty | \mu | \mu+\alpha\mu^2 | \log(\frac{\alpha\mu}{1+\alpha\mu}) | -\frac{1}{\alpha}\log(1-\alpha e^\theta) | 1 |
Gaussian/Normal N(\mu,\sigma^2) | (-\infty,\infty) | \mu | 1 | \mu | \frac{1}{2}\theta^2 | \sigma^2 |
Gamma N(\mu,\nu) | (0,\infty) | \mu | \mu^2 | -\frac{1}{\mu} | -\log(-\theta) | \frac{1}{\nu} |
Inv. Gauss. IG(\mu,\sigma^2) | (0,\infty) | \mu | \mu^3 | -\frac{1}{2\mu^2} | -\sqrt{-2\theta} | \sigma^2 |
Tweedie p\geq 1 | depends on p | \mu | \mu^p | \frac{\mu^{1-p}}{1-p} | \frac{\alpha-1}{\alpha}\left(\frac{\theta}{\alpha-1}\right)^{\alpha} | \phi |
The Tweedie distribution has special cases for p=0,1,2 not listed in the table and uses \alpha=\frac{p-2}{p-1}.
Correspondence of mathematical variables to code:
GLMResults(model, params, ...[, cov_type, ...]) | Class to contain GLM results. |
The distribution families currently implemented are
Family(link, variance) | The parent class for one-parameter exponential families. |
Binomial([link]) | Binomial exponential family distribution. |
Gamma([link]) | Gamma exponential family distribution. |
Gaussian([link]) | Gaussian exponential family distribution. |
InverseGaussian([link]) | InverseGaussian exponential family. |
NegativeBinomial([link, alpha]) | Negative Binomial exponential family. |
Poisson([link]) | Poisson exponential family. |
Tweedie([link, var_power, link_power]) | Tweedie family. |
The link functions currently implemented are the following. Not all link functions are available for each distribution family. The list of available link functions can be obtained by
>>> sm.families.family.<familyname>.links
Link | A generic link function for one-parameter exponential family. |
CDFLink([dbn]) | The use the CDF of a scipy.stats distribution |
CLogLog | The complementary log-log transform |
Log | The log transform |
Logit | The logit transform |
NegativeBinomial([alpha]) | The negative binomial link function |
Power([power]) | The power transform |
cauchy() | The Cauchy (standard Cauchy CDF) transform |
cloglog | The CLogLog transform link function. |
identity() | The identity transform |
inverse_power() | The inverse transform |
inverse_squared() | The inverse squared transform |
log | The log transform |
logit | Methods |
nbinom([alpha]) | The negative binomial link function. |
probit([dbn]) | The probit (standard normal CDF) transform |