Generalized Linear Model¶
Bayesian Generalized Linear Model implementation.
Implementation of Bayesian GLMs using a mixture of Gaussians posterior approximation with the reparameterization trick and variational inference. See [1] for the posterior mixture idea, and [2] for the inference scheme.
[1]  (1, 2) Gershman, S., Hoffman, M., & Blei, D. “Nonparametric variational inference”. Proceedings of the international conference on machine learning. 2012. 
[2]  (1, 2) Kingma, D. P., & Welling, M. “Autoencoding variational Bayes”. Proceedings of the 2nd International Conference on Learning Representations (ICLR). 2014. 

class
revrand.glm.
GeneralizedLinearModel
(likelihood=Gaussian(var=Parameter(value=1.0, bounds=Positive(upper=None), shape=())), basis=LinearBasis(onescol=True, regularizer=Parameter(value=1.0, bounds=Positive(upper=None), shape=())), K=10, maxiter=3000, batch_size=10, updater=None, nsamples=50, nstarts=500, random_state=None)¶ Bayesian Generalized linear model (GLM).
This provides a scikit learn compatible interface for the glm module.
Parameters:  likelihood (Object) – A likelihood object, see the likelihoods module.
 basis (Basis) – A basis object, see the basis_functions module.
 K (int, optional) – Number of diagonal Gaussian components to use to approximate the posterior distribution.
 maxiter (int, optional) – Maximum number of iterations of stochastic gradients to run.
 batch_size (int, optional) – number of observations to use per SGD batch.
 updater (SGDUpdater, optional) – The SGD learning rate updating algorithm to use, by default this is Adam. See revrand.optimize.sgd for different options.
 nsamples (int, optional) – Number of samples for sampling the expected likelihood and expected likelihood gradients
 nstarts (int, optional) – if there are any parameters with distributions as initial values, this determines how many random candidate starts shoulds be evaluated before commencing optimisation at the best candidate.
 random_state (None, int or RandomState, optional) – random seed
Notes
This approximates the posterior distribution over the weights with a mixture of Gaussians:
\[\mathbf{w} \sim \frac{1}{K} \sum^K_{k=1} \mathcal{N}(\mathbf{m_k}, \boldsymbol{\Psi}_k)\]where,
\[\boldsymbol{\Psi}_k = \text{diag}([\Psi_{k,1}, \ldots, \Psi_{k,D}]).\]This is so arbitrary likelihoods can be used with this algorithm, while still mainting flexible and tractable nonGaussian posteriors. Additionaly this has the benefit that we have a reduced number of parameters to optimise (compared with full covariance Gaussians).
 The main differences between this implementation and the GLM in [1] are:
 We use diagonal mixtures, as opposed to isotropic.
 We use auto encoding variational Bayes (AEVB) inference [2] with stochastic gradients.
This uses the python logging module for displaying learning status. To view these messages have something like,
import logging logging.basicConfig(level=logging.INFO) log = logging.getLogger(__name__)
in your calling code.

fit
(X, y, likelihood_args=())¶ Learn the parameters of a Bayesian generalized linear model (GLM).
Parameters:  X (ndarray) – (N, d) array input dataset (N samples, d dimensions).
 y (ndarray) – (N,) array targets (N samples)
 likelihood (Object) – A likelihood object, see the likelihoods module.
 likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are nonlearnable parameters. They can be scalars or arrays of length N.

predict
(X, nsamples=200, likelihood_args=())¶ Predict target values from Bayesian generalized linear regression.
Parameters:  X (ndarray) – (N*,d) array query input dataset (N* samples, d dimensions).
 nsamples (int, optional) – Number of samples for sampling the expected target values from the predictive distribution.
 likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are nonlearnable parameters. They can be scalars or arrays of length N.
Returns: Ey – The expected value of y* for the query inputs, X* of shape (N*,).
Return type: ndarray

predict_cdf
(X, quantile, nsamples=200, likelihood_args=())¶ Predictive cumulative density function of a Bayesian GLM.
Parameters:  X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
 quantile (float) – The predictive probability, \(p(y^* \leq \text{quantile}  \mathbf{x}^*, \mathbf{X}, y)\).
 nsamples (int, optional) – Number of samples for sampling the predictive CDF.
 likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are nonlearnable parameters. They can be scalars or arrays of length N*.
 nsamples – The number of samples to draw from the posterior in order to approximate the predictive mean and variance.
Returns:  p (ndarray) – The probability of y* <= quantile for the query inputs, X* of shape (N*,).
 p_min (ndarray) – The minimum sampled values of the predicted probability (same shape as p)
 p_max (ndarray) – The maximum sampled values of the predicted probability (same shape as p)

predict_interval
(X, percentile, nsamples=200, likelihood_args=(), multiproc=True)¶ Predictive percentile interval (upper and lower quantiles).
Parameters:  X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
 percentile (float) – The percentile confidence interval (e.g. 95%) to return.
 nsamples (int, optional) – Number of samples for sampling the predictive percentiles.
 likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are nonlearnable parameters. They can be scalars or arrays of length N*.
 multiproc (bool, optional) – Use multiprocessing to paralellise this prediction computation.
Returns:  ql (ndarray) – The lower end point of the interval with shape (N*,)
 qu (ndarray) – The upper end point of the interval with shape (N*,)

predict_logpdf
(X, y, nsamples=200, likelihood_args=())¶ Predictive logprobability density function of a Bayesian GLM.
Parameters:  X (ndarray) – (N*,d) array query input dataset (N* samples, D dimensions).
 y (float or ndarray) – The test observations of shape (N*,) to evaluate under, \(\log p(y^* \mathbf{x}^*, \mathbf{X}, y)\).
 nsamples (int, optional) – Number of samples for sampling the log predictive distribution.
 likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are nonlearnable parameters. They can be scalars or arrays of length N*.
Returns:  logp (ndarray) – The log probability of y* given X* of shape (N*,).
 logp_min (ndarray) – The minimum sampled values of the predicted log probability (same shape as p)
 logp_max (ndarray) – The maximum sampled values of the predicted log probability (same shape as p)

predict_moments
(X, nsamples=200, likelihood_args=())¶ Predictive moments, in particular mean and variance, of a Bayesian GLM.
This function uses MonteCarlo sampling to evaluate the predictive mean and variance of a Bayesian GLM. The exact expressions evaluated are,
\[ \begin{align}\begin{aligned}\mathbb{E}[y^*  \mathbf{x^*}, \mathbf{X}, y] &= \int \mathbb{E}[y^*  \mathbf{w}, \phi(\mathbf{x}^*)] p(\mathbf{w}  \mathbf{y}, \boldsymbol\Phi) d\mathbf{w},\\\mathbb{V}[y^*  \mathbf{x^*}, \mathbf{X}, y] &= \int \left(\mathbb{E}[y^*  \mathbf{w}, \phi(\mathbf{x}^*)]  \mathbb{E}[y^*  \mathbf{x^*}, \mathbf{X}, y]\right)^2 p(\mathbf{w}  \mathbf{y}, \boldsymbol\Phi) d\mathbf{w},\end{aligned}\end{align} \]where \(\mathbb{E}[y^*  \mathbf{w}, \phi(\mathbf{x}^*)]\) is the the expected value of \(y^*\) from the likelihood, and \(p(\mathbf{w}  \mathbf{y}, \boldsymbol\Phi)\) is the posterior distribution over weights (from
learn
). Here are few concrete examples of how we can use these values, Gaussian likelihood: these are just the predicted mean and variance,
see
revrand.regression.predict
 Bernoulli likelihood: The expected value is the probability, \(p(y^* = 1)\), i.e. the probability of class one. The variance may not be so useful.
 Poisson likelihood: The expected value is similar conceptually to the
Gaussian case, and is also a continuous value. The median (50%
quantile) from
predict_interval
is a discrete value. Again, the variance in this instance may not be so useful.
Parameters:  X (ndarray) – (N*,d) array query input dataset (N* samples, d dimensions).
 nsamples (int, optional) – Number of samples for sampling the expected moments from the predictive distribution.
 likelihood_args (sequence, optional) – sequence of arguments to pass to the likelihood function. These are nonlearnable parameters. They can be scalars or arrays of length N.
Returns:  Ey (ndarray) – The expected value of y* for the query inputs, X* of shape (N*,).
 Vy (ndarray) – The expected variance of y* (excluding likelihood noise terms) for the query inputs, X* of shape (N*,).
 Gaussian likelihood: these are just the predicted mean and variance,
see