Generalized Linear Model

Bayesian Generalized Linear Model implementation.

Implementation of Bayesian GLMs using a mixture of Gaussians posterior approximation with the reparameterization trick and variational inference. See [1] for the posterior mixture idea, and [2] for the inference scheme.

[1](1, 2) Gershman, S., Hoffman, M., & Blei, D. “Nonparametric variational inference”. Proceedings of the international conference on machine learning. 2012.
[2](1, 2) Kingma, D. P., & Welling, M. “Auto-encoding variational Bayes”. Proceedings of the 2nd International Conference on Learning Representations (ICLR). 2014.
class revrand.glm.GeneralizedLinearModel(likelihood=Gaussian(var=Parameter(value=1.0, bounds=Positive(upper=None), shape=())), basis=LinearBasis(onescol=True, regularizer=Parameter(value=1.0, bounds=Positive(upper=None), shape=())), K=10, maxiter=3000, batch_size=10, updater=None, nsamples=50, nstarts=500, random_state=None)

Bayesian Generalized linear model (GLM).

This provides a scikit learn compatible interface for the glm module.

Parameters:
  • likelihood (Object) – A likelihood object, see the likelihoods module.
  • basis (Basis) – A basis object, see the basis_functions module.
  • K (int, optional) – Number of diagonal Gaussian components to use to approximate the posterior distribution.
  • maxiter (int, optional) – Maximum number of iterations of stochastic gradients to run.
  • batch_size (int, optional) – number of observations to use per SGD batch.
  • updater (SGDUpdater, optional) – The SGD learning rate updating algorithm to use, by default this is Adam. See revrand.optimize.sgd for different options.
  • nsamples (int, optional) – Number of samples for sampling the expected likelihood and expected likelihood gradients
  • nstarts (int, optional) – if there are any parameters with distributions as initial values, this determines how many random candidate starts shoulds be evaluated before commencing optimisation at the best candidate.
  • random_state (None, int or RandomState, optional) – random seed

Notes

This approximates the posterior distribution over the weights with a mixture of Gaussians:

\[\mathbf{w} \sim \frac{1}{K} \sum^K_{k=1} \mathcal{N}(\mathbf{m_k}, \boldsymbol{\Psi}_k)\]

where,

\[\boldsymbol{\Psi}_k = \text{diag}([\Psi_{k,1}, \ldots, \Psi_{k,D}]).\]

This is so arbitrary likelihoods can be used with this algorithm, while still mainting flexible and tractable non-Gaussian posteriors. Additionaly this has the benefit that we have a reduced number of parameters to optimise (compared with full covariance Gaussians).

The main differences between this implementation and the GLM in [1] are:
  • We use diagonal mixtures, as opposed to isotropic.
  • We use auto encoding variational Bayes (AEVB) inference [2] with stochastic gradients.

This uses the python logging module for displaying learning status. To view these messages have something like,

import logging
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

in your calling code.

fit(X, y, likelihood_args=())

Learn the parameters of a Bayesian generalized linear model (GLM).

Parameters:
  • X (ndarray) – (N, d) array input dataset (N samples, d dimensions).
  • y (ndarray) – (N,) array targets (N samples)
  • likelihood (Object) – A likelihood object, see the likelihoods module.
  • likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N.
predict(X, nsamples=200, likelihood_args=())

Predict target values from Bayesian generalized linear regression.

Parameters:
  • X (ndarray) – (N*,d) array query input dataset (N* samples, d dimensions).
  • nsamples (int, optional) – Number of samples for sampling the expected target values from the predictive distribution.
  • likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N.
Returns:

Ey – The expected value of y* for the query inputs, X* of shape (N*,).

Return type:

ndarray

predict_cdf(X, quantile, nsamples=200, likelihood_args=())

Predictive cumulative density function of a Bayesian GLM.

Parameters:
  • X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
  • quantile (float) – The predictive probability, \(p(y^* \leq \text{quantile} | \mathbf{x}^*, \mathbf{X}, y)\).
  • nsamples (int, optional) – Number of samples for sampling the predictive CDF.
  • likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*.
  • nsamples – The number of samples to draw from the posterior in order to approximate the predictive mean and variance.
Returns:

  • p (ndarray) – The probability of y* <= quantile for the query inputs, X* of shape (N*,).
  • p_min (ndarray) – The minimum sampled values of the predicted probability (same shape as p)
  • p_max (ndarray) – The maximum sampled values of the predicted probability (same shape as p)

predict_interval(X, percentile, nsamples=200, likelihood_args=(), multiproc=True)

Predictive percentile interval (upper and lower quantiles).

Parameters:
  • X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
  • percentile (float) – The percentile confidence interval (e.g. 95%) to return.
  • nsamples (int, optional) – Number of samples for sampling the predictive percentiles.
  • likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*.
  • multiproc (bool, optional) – Use multiprocessing to paralellise this prediction computation.
Returns:

  • ql (ndarray) – The lower end point of the interval with shape (N*,)
  • qu (ndarray) – The upper end point of the interval with shape (N*,)

predict_logpdf(X, y, nsamples=200, likelihood_args=())

Predictive log-probability density function of a Bayesian GLM.

Parameters:
  • X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
  • y (float or ndarray) – The test observations of shape (N*,) to evaluate under, \(\log p(y^* |\mathbf{x}^*, \mathbf{X}, y)\).
  • nsamples (int, optional) – Number of samples for sampling the log predictive distribution.
  • likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*.
Returns:

  • logp (ndarray) – The log probability of y* given X* of shape (N*,).
  • logp_min (ndarray) – The minimum sampled values of the predicted log probability (same shape as p)
  • logp_max (ndarray) – The maximum sampled values of the predicted log probability (same shape as p)

predict_moments(X, nsamples=200, likelihood_args=())

Predictive moments, in particular mean and variance, of a Bayesian GLM.

This function uses Monte-Carlo sampling to evaluate the predictive mean and variance of a Bayesian GLM. The exact expressions evaluated are,

\[ \begin{align}\begin{aligned}\mathbb{E}[y^* | \mathbf{x^*}, \mathbf{X}, y] &= \int \mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)] p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi) d\mathbf{w},\\\mathbb{V}[y^* | \mathbf{x^*}, \mathbf{X}, y] &= \int \left(\mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)] - \mathbb{E}[y^* | \mathbf{x^*}, \mathbf{X}, y]\right)^2 p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi) d\mathbf{w},\end{aligned}\end{align} \]

where \(\mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)]\) is the the expected value of \(y^*\) from the likelihood, and \(p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi)\) is the posterior distribution over weights (from learn). Here are few concrete examples of how we can use these values,

  • Gaussian likelihood: these are just the predicted mean and variance, see revrand.regression.predict
  • Bernoulli likelihood: The expected value is the probability, \(p(y^* = 1)\), i.e. the probability of class one. The variance may not be so useful.
  • Poisson likelihood: The expected value is similar conceptually to the Gaussian case, and is also a continuous value. The median (50% quantile) from predict_interval is a discrete value. Again, the variance in this instance may not be so useful.
Parameters:
  • X (ndarray) – (N*,d) array query input dataset (N* samples, d dimensions).
  • nsamples (int, optional) – Number of samples for sampling the expected moments from the predictive distribution.
  • likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N.
Returns:

  • Ey (ndarray) – The expected value of y* for the query inputs, X* of shape (N*,).
  • Vy (ndarray) – The expected variance of y* (excluding likelihood noise terms) for the query inputs, X* of shape (N*,).