Generalized Linear Model¶

Bayesian Generalized Linear Model implementation.

Implementation of Bayesian GLMs using a mixture of Gaussians posterior approximation with the reparameterization trick and variational inference. See [1] for the posterior mixture idea, and [2] for the inference scheme.

 [1] (1, 2) Gershman, S., Hoffman, M., & Blei, D. “Nonparametric variational inference”. Proceedings of the international conference on machine learning. 2012.
 [2] (1, 2) Kingma, D. P., & Welling, M. “Auto-encoding variational Bayes”. Proceedings of the 2nd International Conference on Learning Representations (ICLR). 2014.
class revrand.glm.GeneralizedLinearModel(likelihood=Gaussian(var=Parameter(value=1.0, bounds=Positive(upper=None), shape=())), basis=LinearBasis(onescol=True, regularizer=Parameter(value=1.0, bounds=Positive(upper=None), shape=())), K=10, maxiter=3000, batch_size=10, updater=None, nsamples=50, nstarts=500, random_state=None)

Bayesian Generalized linear model (GLM).

This provides a scikit learn compatible interface for the glm module.

Parameters: likelihood (Object) – A likelihood object, see the likelihoods module. basis (Basis) – A basis object, see the basis_functions module. K (int, optional) – Number of diagonal Gaussian components to use to approximate the posterior distribution. maxiter (int, optional) – Maximum number of iterations of stochastic gradients to run. batch_size (int, optional) – number of observations to use per SGD batch. updater (SGDUpdater, optional) – The SGD learning rate updating algorithm to use, by default this is Adam. See revrand.optimize.sgd for different options. nsamples (int, optional) – Number of samples for sampling the expected likelihood and expected likelihood gradients nstarts (int, optional) – if there are any parameters with distributions as initial values, this determines how many random candidate starts shoulds be evaluated before commencing optimisation at the best candidate. random_state (None, int or RandomState, optional) – random seed

Notes

This approximates the posterior distribution over the weights with a mixture of Gaussians:

$\mathbf{w} \sim \frac{1}{K} \sum^K_{k=1} \mathcal{N}(\mathbf{m_k}, \boldsymbol{\Psi}_k)$

where,

$\boldsymbol{\Psi}_k = \text{diag}([\Psi_{k,1}, \ldots, \Psi_{k,D}]).$

This is so arbitrary likelihoods can be used with this algorithm, while still mainting flexible and tractable non-Gaussian posteriors. Additionaly this has the benefit that we have a reduced number of parameters to optimise (compared with full covariance Gaussians).

The main differences between this implementation and the GLM in [1] are:
• We use diagonal mixtures, as opposed to isotropic.
• We use auto encoding variational Bayes (AEVB) inference [2] with stochastic gradients.

This uses the python logging module for displaying learning status. To view these messages have something like,

import logging
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)


fit(X, y, likelihood_args=())

Learn the parameters of a Bayesian generalized linear model (GLM).

Parameters: X (ndarray) – (N, d) array input dataset (N samples, d dimensions). y (ndarray) – (N,) array targets (N samples) likelihood (Object) – A likelihood object, see the likelihoods module. likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N.
predict(X, nsamples=200, likelihood_args=())

Predict target values from Bayesian generalized linear regression.

Parameters: X (ndarray) – (N*,d) array query input dataset (N* samples, d dimensions). nsamples (int, optional) – Number of samples for sampling the expected target values from the predictive distribution. likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N. Ey – The expected value of y* for the query inputs, X* of shape (N*,). ndarray
predict_cdf(X, quantile, nsamples=200, likelihood_args=())

Predictive cumulative density function of a Bayesian GLM.

Parameters: X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions). quantile (float) – The predictive probability, $$p(y^* \leq \text{quantile} | \mathbf{x}^*, \mathbf{X}, y)$$. nsamples (int, optional) – Number of samples for sampling the predictive CDF. likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*. nsamples – The number of samples to draw from the posterior in order to approximate the predictive mean and variance. p (ndarray) – The probability of y* <= quantile for the query inputs, X* of shape (N*,). p_min (ndarray) – The minimum sampled values of the predicted probability (same shape as p) p_max (ndarray) – The maximum sampled values of the predicted probability (same shape as p)
predict_interval(X, percentile, nsamples=200, likelihood_args=(), multiproc=True)

Predictive percentile interval (upper and lower quantiles).

Parameters: X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions). percentile (float) – The percentile confidence interval (e.g. 95%) to return. nsamples (int, optional) – Number of samples for sampling the predictive percentiles. likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*. multiproc (bool, optional) – Use multiprocessing to paralellise this prediction computation. ql (ndarray) – The lower end point of the interval with shape (N*,) qu (ndarray) – The upper end point of the interval with shape (N*,)
predict_logpdf(X, y, nsamples=200, likelihood_args=())

Predictive log-probability density function of a Bayesian GLM.

Parameters: X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions). y (float or ndarray) – The test observations of shape (N*,) to evaluate under, $$\log p(y^* |\mathbf{x}^*, \mathbf{X}, y)$$. nsamples (int, optional) – Number of samples for sampling the log predictive distribution. likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*. logp (ndarray) – The log probability of y* given X* of shape (N*,). logp_min (ndarray) – The minimum sampled values of the predicted log probability (same shape as p) logp_max (ndarray) – The maximum sampled values of the predicted log probability (same shape as p)
predict_moments(X, nsamples=200, likelihood_args=())

Predictive moments, in particular mean and variance, of a Bayesian GLM.

This function uses Monte-Carlo sampling to evaluate the predictive mean and variance of a Bayesian GLM. The exact expressions evaluated are,

\begin{align}\begin{aligned}\mathbb{E}[y^* | \mathbf{x^*}, \mathbf{X}, y] &= \int \mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)] p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi) d\mathbf{w},\\\mathbb{V}[y^* | \mathbf{x^*}, \mathbf{X}, y] &= \int \left(\mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)] - \mathbb{E}[y^* | \mathbf{x^*}, \mathbf{X}, y]\right)^2 p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi) d\mathbf{w},\end{aligned}\end{align}

where $$\mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)]$$ is the the expected value of $$y^*$$ from the likelihood, and $$p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi)$$ is the posterior distribution over weights (from learn). Here are few concrete examples of how we can use these values,

• Gaussian likelihood: these are just the predicted mean and variance, see revrand.regression.predict
• Bernoulli likelihood: The expected value is the probability, $$p(y^* = 1)$$, i.e. the probability of class one. The variance may not be so useful.
• Poisson likelihood: The expected value is similar conceptually to the Gaussian case, and is also a continuous value. The median (50% quantile) from predict_interval is a discrete value. Again, the variance in this instance may not be so useful.
Parameters: X (ndarray) – (N*,d) array query input dataset (N* samples, d dimensions). nsamples (int, optional) – Number of samples for sampling the expected moments from the predictive distribution. likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N. Ey (ndarray) – The expected value of y* for the query inputs, X* of shape (N*,). Vy (ndarray) – The expected variance of y* (excluding likelihood noise terms) for the query inputs, X* of shape (N*,).