Generalized Linear Model¶

Bayesian Generalized Linear Model implementation.

Implementation of Bayesian GLMs using a mixture of Gaussians posterior approximation with the reparameterization trick and variational inference. See [1] for the posterior mixture idea, and [2] for the inference scheme.

[1]	(1, 2) Gershman, S., Hoffman, M., & Blei, D. “Nonparametric variational inference”. Proceedings of the international conference on machine learning. 2012.

[2]	(1, 2) Kingma, D. P., & Welling, M. “Auto-encoding variational Bayes”. Proceedings of the 2nd International Conference on Learning Representations (ICLR). 2014.

class revrand.glm.GeneralizedLinearModel(likelihood=Gaussian(var=Parameter(value=1.0, bounds=Positive(upper=None), shape=())), basis=LinearBasis(onescol=True, regularizer=Parameter(value=1.0, bounds=Positive(upper=None), shape=())), K=10, maxiter=3000, batch_size=10, updater=None, nsamples=50, nstarts=500, random_state=None)¶

Bayesian Generalized linear model (GLM).

This provides a scikit learn compatible interface for the glm module.

Parameters:

likelihood (Object) – A likelihood object, see the likelihoods module.
basis (Basis) – A basis object, see the basis_functions module.
K (int, optional) – Number of diagonal Gaussian components to use to approximate the posterior distribution.
maxiter (int, optional) – Maximum number of iterations of stochastic gradients to run.
batch_size (int, optional) – number of observations to use per SGD batch.
updater (SGDUpdater, optional) – The SGD learning rate updating algorithm to use, by default this is Adam. See revrand.optimize.sgd for different options.
nsamples (int, optional) – Number of samples for sampling the expected likelihood and expected likelihood gradients
nstarts (int, optional) – if there are any parameters with distributions as initial values, this determines how many random candidate starts shoulds be evaluated before commencing optimisation at the best candidate.
random_state (None, int or RandomState, optional) – random seed

Notes

This approximates the posterior distribution over the weights with a mixture of Gaussians:

\[\mathbf{w} \sim \frac{1}{K} \sum^K_{k=1} \mathcal{N}(\mathbf{m_k}, \boldsymbol{\Psi}_k)\]

where,

\[\boldsymbol{\Psi}_k = \text{diag}([\Psi_{k,1}, \ldots, \Psi_{k,D}]).\]

This is so arbitrary likelihoods can be used with this algorithm, while still mainting flexible and tractable non-Gaussian posteriors. Additionaly this has the benefit that we have a reduced number of parameters to optimise (compared with full covariance Gaussians).

The main differences between this implementation and the GLM in [1] are:

We use diagonal mixtures, as opposed to isotropic.
We use auto encoding variational Bayes (AEVB) inference [2] with stochastic gradients.

This uses the python logging module for displaying learning status. To view these messages have something like,

import logging
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

in your calling code.

fit(X, y, likelihood_args=())¶

Learn the parameters of a Bayesian generalized linear model (GLM).

Parameters:	X (ndarray) – (N, d) array input dataset (N samples, d dimensions). y (ndarray) – (N,) array targets (N samples) likelihood (Object) – A likelihood object, see the likelihoods module. likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N.

predict(X, nsamples=200, likelihood_args=())¶

Predict target values from Bayesian generalized linear regression.

Parameters:	X (ndarray) – (N,d) array query input dataset (N samples, d dimensions). nsamples (int, optional) – Number of samples for sampling the expected target values from the predictive distribution. likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N.
Returns:	Ey – The expected value of y* for the query inputs, X* of shape (N*,).
Return type:	ndarray

predict_cdf(X, quantile, nsamples=200, likelihood_args=())¶

Predictive cumulative density function of a Bayesian GLM.

Parameters:

X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
quantile (float) – The predictive probability, \(p(y^* \leq \text{quantile} | \mathbf{x}^*, \mathbf{X}, y)\).
nsamples (int, optional) – Number of samples for sampling the predictive CDF.
likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*.
nsamples – The number of samples to draw from the posterior in order to approximate the predictive mean and variance.

Returns:

p (ndarray) – The probability of y* <= quantile for the query inputs, X* of shape (N*,).
p_min (ndarray) – The minimum sampled values of the predicted probability (same shape as p)
p_max (ndarray) – The maximum sampled values of the predicted probability (same shape as p)

predict_interval(X, percentile, nsamples=200, likelihood_args=(), multiproc=True)¶

Predictive percentile interval (upper and lower quantiles).

Parameters:

X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
percentile (float) – The percentile confidence interval (e.g. 95%) to return.
nsamples (int, optional) – Number of samples for sampling the predictive percentiles.
likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*.
multiproc (bool, optional) – Use multiprocessing to paralellise this prediction computation.

Returns:

ql (ndarray) – The lower end point of the interval with shape (N*,)
qu (ndarray) – The upper end point of the interval with shape (N*,)

predict_logpdf(X, y, nsamples=200, likelihood_args=())¶

Predictive log-probability density function of a Bayesian GLM.

Parameters:

X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
y (float or ndarray) – The test observations of shape (N*,) to evaluate under, \(\log p(y^* |\mathbf{x}^*, \mathbf{X}, y)\).
nsamples (int, optional) – Number of samples for sampling the log predictive distribution.
likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*.

Returns:

logp (ndarray) – The log probability of y* given X* of shape (N*,).
logp_min (ndarray) – The minimum sampled values of the predicted log probability (same shape as p)
logp_max (ndarray) – The maximum sampled values of the predicted log probability (same shape as p)

predict_moments(X, nsamples=200, likelihood_args=())¶

Predictive moments, in particular mean and variance, of a Bayesian GLM.

This function uses Monte-Carlo sampling to evaluate the predictive mean and variance of a Bayesian GLM. The exact expressions evaluated are,

\[ \begin{align}\begin{aligned}\mathbb{E}[y^* | \mathbf{x^*}, \mathbf{X}, y] &= \int \mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)] p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi) d\mathbf{w},\\\mathbb{V}[y^* | \mathbf{x^*}, \mathbf{X}, y] &= \int \left(\mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)] - \mathbb{E}[y^* | \mathbf{x^*}, \mathbf{X}, y]\right)^2 p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi) d\mathbf{w},\end{aligned}\end{align} \]

where \(\mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)]\) is the the expected value of \(y^*\) from the likelihood, and \(p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi)\) is the posterior distribution over weights (from learn). Here are few concrete examples of how we can use these values,

Gaussian likelihood: these are just the predicted mean and variance, see revrand.regression.predict
Bernoulli likelihood: The expected value is the probability, \(p(y^* = 1)\), i.e. the probability of class one. The variance may not be so useful.
Poisson likelihood: The expected value is similar conceptually to the Gaussian case, and is also a continuous value. The median (50% quantile) from predict_interval is a discrete value. Again, the variance in this instance may not be so useful.

Parameters:

X (ndarray) – (N*,d) array query input dataset (N* samples, d dimensions).
nsamples (int, optional) – Number of samples for sampling the expected moments from the predictive distribution.
likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N.

Returns:

Ey (ndarray) – The expected value of y* for the query inputs, X* of shape (N*,).
Vy (ndarray) – The expected variance of y* (excluding likelihood noise terms) for the query inputs, X* of shape (N*,).

Generalized Linear Model¶

revrand

Navigation

Related Topics

This Page