Generalized Linear Model¶
Bayesian Generalized Linear Model implementation.
Implementation of Bayesian GLMs using a mixture of Gaussians posterior approximation with the reparameterization trick and variational inference. See [1] for the posterior mixture idea, and [2] for the inference scheme.
[1] | (1, 2) Gershman, S., Hoffman, M., & Blei, D. “Nonparametric variational inference”. Proceedings of the international conference on machine learning. 2012. |
[2] | (1, 2) Kingma, D. P., & Welling, M. “Auto-encoding variational Bayes”. Proceedings of the 2nd International Conference on Learning Representations (ICLR). 2014. |
-
class
revrand.glm.
GeneralizedLinearModel
(likelihood=Gaussian(var=Parameter(value=1.0, bounds=Positive(upper=None), shape=())), basis=LinearBasis(onescol=True, regularizer=Parameter(value=1.0, bounds=Positive(upper=None), shape=())), K=10, maxiter=3000, batch_size=10, updater=None, nsamples=50, nstarts=500, random_state=None)¶ Bayesian Generalized linear model (GLM).
This provides a scikit learn compatible interface for the glm module.
Parameters: - likelihood (Object) – A likelihood object, see the likelihoods module.
- basis (Basis) – A basis object, see the basis_functions module.
- K (int, optional) – Number of diagonal Gaussian components to use to approximate the posterior distribution.
- maxiter (int, optional) – Maximum number of iterations of stochastic gradients to run.
- batch_size (int, optional) – number of observations to use per SGD batch.
- updater (SGDUpdater, optional) – The SGD learning rate updating algorithm to use, by default this is Adam. See revrand.optimize.sgd for different options.
- nsamples (int, optional) – Number of samples for sampling the expected likelihood and expected likelihood gradients
- nstarts (int, optional) – if there are any parameters with distributions as initial values, this determines how many random candidate starts shoulds be evaluated before commencing optimisation at the best candidate.
- random_state (None, int or RandomState, optional) – random seed
Notes
This approximates the posterior distribution over the weights with a mixture of Gaussians:
\[\mathbf{w} \sim \frac{1}{K} \sum^K_{k=1} \mathcal{N}(\mathbf{m_k}, \boldsymbol{\Psi}_k)\]where,
\[\boldsymbol{\Psi}_k = \text{diag}([\Psi_{k,1}, \ldots, \Psi_{k,D}]).\]This is so arbitrary likelihoods can be used with this algorithm, while still mainting flexible and tractable non-Gaussian posteriors. Additionaly this has the benefit that we have a reduced number of parameters to optimise (compared with full covariance Gaussians).
- The main differences between this implementation and the GLM in [1] are:
- We use diagonal mixtures, as opposed to isotropic.
- We use auto encoding variational Bayes (AEVB) inference [2] with stochastic gradients.
This uses the python logging module for displaying learning status. To view these messages have something like,
import logging logging.basicConfig(level=logging.INFO) log = logging.getLogger(__name__)
in your calling code.
-
fit
(X, y, likelihood_args=())¶ Learn the parameters of a Bayesian generalized linear model (GLM).
Parameters: - X (ndarray) – (N, d) array input dataset (N samples, d dimensions).
- y (ndarray) – (N,) array targets (N samples)
- likelihood (Object) – A likelihood object, see the likelihoods module.
- likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N.
-
predict
(X, nsamples=200, likelihood_args=())¶ Predict target values from Bayesian generalized linear regression.
Parameters: - X (ndarray) – (N*,d) array query input dataset (N* samples, d dimensions).
- nsamples (int, optional) – Number of samples for sampling the expected target values from the predictive distribution.
- likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N.
Returns: Ey – The expected value of y* for the query inputs, X* of shape (N*,).
Return type: ndarray
-
predict_cdf
(X, quantile, nsamples=200, likelihood_args=())¶ Predictive cumulative density function of a Bayesian GLM.
Parameters: - X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
- quantile (float) – The predictive probability, \(p(y^* \leq \text{quantile} | \mathbf{x}^*, \mathbf{X}, y)\).
- nsamples (int, optional) – Number of samples for sampling the predictive CDF.
- likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*.
- nsamples – The number of samples to draw from the posterior in order to approximate the predictive mean and variance.
Returns: - p (ndarray) – The probability of y* <= quantile for the query inputs, X* of shape (N*,).
- p_min (ndarray) – The minimum sampled values of the predicted probability (same shape as p)
- p_max (ndarray) – The maximum sampled values of the predicted probability (same shape as p)
-
predict_interval
(X, percentile, nsamples=200, likelihood_args=(), multiproc=True)¶ Predictive percentile interval (upper and lower quantiles).
Parameters: - X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
- percentile (float) – The percentile confidence interval (e.g. 95%) to return.
- nsamples (int, optional) – Number of samples for sampling the predictive percentiles.
- likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*.
- multiproc (bool, optional) – Use multiprocessing to paralellise this prediction computation.
Returns: - ql (ndarray) – The lower end point of the interval with shape (N*,)
- qu (ndarray) – The upper end point of the interval with shape (N*,)
-
predict_logpdf
(X, y, nsamples=200, likelihood_args=())¶ Predictive log-probability density function of a Bayesian GLM.
Parameters: - X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
- y (float or ndarray) – The test observations of shape (N*,) to evaluate under, \(\log p(y^* |\mathbf{x}^*, \mathbf{X}, y)\).
- nsamples (int, optional) – Number of samples for sampling the log predictive distribution.
- likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N*.
Returns: - logp (ndarray) – The log probability of y* given X* of shape (N*,).
- logp_min (ndarray) – The minimum sampled values of the predicted log probability (same shape as p)
- logp_max (ndarray) – The maximum sampled values of the predicted log probability (same shape as p)
-
predict_moments
(X, nsamples=200, likelihood_args=())¶ Predictive moments, in particular mean and variance, of a Bayesian GLM.
This function uses Monte-Carlo sampling to evaluate the predictive mean and variance of a Bayesian GLM. The exact expressions evaluated are,
\[ \begin{align}\begin{aligned}\mathbb{E}[y^* | \mathbf{x^*}, \mathbf{X}, y] &= \int \mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)] p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi) d\mathbf{w},\\\mathbb{V}[y^* | \mathbf{x^*}, \mathbf{X}, y] &= \int \left(\mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)] - \mathbb{E}[y^* | \mathbf{x^*}, \mathbf{X}, y]\right)^2 p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi) d\mathbf{w},\end{aligned}\end{align} \]where \(\mathbb{E}[y^* | \mathbf{w}, \phi(\mathbf{x}^*)]\) is the the expected value of \(y^*\) from the likelihood, and \(p(\mathbf{w} | \mathbf{y}, \boldsymbol\Phi)\) is the posterior distribution over weights (from
learn
). Here are few concrete examples of how we can use these values,- Gaussian likelihood: these are just the predicted mean and variance,
see
revrand.regression.predict
- Bernoulli likelihood: The expected value is the probability, \(p(y^* = 1)\), i.e. the probability of class one. The variance may not be so useful.
- Poisson likelihood: The expected value is similar conceptually to the
Gaussian case, and is also a continuous value. The median (50%
quantile) from
predict_interval
is a discrete value. Again, the variance in this instance may not be so useful.
Parameters: - X (ndarray) – (N*,d) array query input dataset (N* samples, d dimensions).
- nsamples (int, optional) – Number of samples for sampling the expected moments from the predictive distribution.
- likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are non-learnable parameters. They can be scalars or arrays of length N.
Returns: - Ey (ndarray) – The expected value of y* for the query inputs, X* of shape (N*,).
- Vy (ndarray) – The expected variance of y* (excluding likelihood noise terms) for the query inputs, X* of shape (N*,).
- Gaussian likelihood: these are just the predicted mean and variance,
see