# Quickstart Guide¶

To install, simply run `setup.py`

:

```
$ python setup.py install
```

or install with `pip`

:

```
$ pip install git+https://github.com/nicta/revrand.git
```

Refer to Installation for advanced installation instructions.

Have a look at some of the demo notebooks.

## Bayesian Linear Regression Example¶

Here is a very quick example of how to use Bayesian linear regression (Standard Linear Model) with optimisation of the likelihood noise, regularizer and basis function hyperparameters.

Assuming we already have training noisy targets `y`

, inputs `X`

, and some
query inputs `Xs`

(as well as the true noiseless function `f`

):

```
import matplotlib.pyplot as pl
import numpy as np
from scipy.stats import gamma
from revrand import StandardLinearModel, Parameter, Positive
from revrand.basis_functions import LinearBasis, RandomRBF
...
# Concatenate a linear basis and a Random radial basis (GP approx)
init_lenscale = Parameter(gamma(1), Positive()) # Random starts sampling
basis = LinearBasis(onescol=True) \
+ RandomRBF(nbases=300, Xdim=X.shape[1], init_lenscale)
# Learn regression parameters and predict (by default, this will evaluate
# 100 random values for the length scale, variance and regularizer before
# starting optimisation from the best candidates)
slm = StandardLinearModel(basis)
slm.fit(X, y)
Eys, Vys = slm.predict_moments(Xs)
# Training/Truth
pl.plot(X, y, 'k.', label='Training')
pl.plot(Xs, f, 'k-', label='Truth')
# Plot Regressor
Sys = np.sqrt(Vys)
pl.plot(Xs, Eys, 'g-', label='Bayesian linear regression')
pl.fill_between(Xs, Eys - 2 * Sys, Eys + 2 * Sys, facecolor='none',
edgecolor='g', linestyle='--', label=None)
pl.legend()
pl.grid(True)
pl.title('Regression demo')
pl.ylabel('y')
pl.xlabel('x')
pl.show()
```

This script will output something like the following,

The default behaviour of the algorithms in revrand is to randomly sample and evaluate these hyperparameters before starting optimization from the best random candidates. This is because the objective functions may be non-convex with respect to these parameters, and in this way revrand can achieve some robustness to bad initializations.

Note

All of the hyperparameters in revrand are initialised using `Parameter`

objects. These objects contain information on the bounds of the values of
these hyperparameters, as well as *how* to intialize them. For instance,

```
>>> from revrand import Parameter, Bound
>>> hyper = Parameter(1., Bound(0.5, 100))
```

tells the optimizer to bound the feasible range of the hyperparameter between 0.5 and 100, with an initial value of 1. Futhermore,

```
>>> from revrand import Parameter, Positive
>>> from scipy.stats import gamma
>>> hyper = Parameter(gamma(1., scale=10), Positive())
```

tells the optimizer it can use *random starts* with this hyperparameter,
and to draw these random starts from a gamma distribution with a shape of
1, and a scale of 10. If random starts are not used, then the expected
value (10) of this distribution is used as the initial value. Also,
`Positive()`

indicates the value of the parameter must be within ```
(0,
inf)
```

.

See Bound and Parameter Types and Optimization for more information on how to use these random initializers.

## Bayesian Generalized Linear Model Example¶

This example is very similar to that above, but now let’s assume our targets
`y`

are drawn from a Poisson likelihood, or observation, distribution which
is a function of the inputs, `X`

. The task here is to predict the mean of the
Poisson distribution for query inputs `Xs`

, as well as the uncertainty
associated with the prediction. For this we need to use a generalized linear
model (GLM, Generalized Linear Model):

```
import matplotlib.pyplot as pl
import numpy as np
from scipy.stats import gamma
from revrand import GeneralizedLinearModel
from revrand.basis_functions import RandomRBF
...
# Random radial basis (GP approx)
init_lenscale = Parameter(gamma(1), Positive()) # Random starts sampling
basis = RandomRBF(nbases=100, Xdim=X.shape[1], init_lenscale)
# Set up the likelihood of the GLM
llhood = likelihoods.Poisson(tranfcn='exp') # log link
# Learn regression parameters and predict
glm = GeneralizedLinearModel(llhood, basis)
glm.fit(X, y)
Eys = glm.predict(Xs)
y95n, y95x = glm.predict_interval(0.95, Xs)
# Training/Truth
pl.plot(X, y, 'k.', label='Training')
pl.plot(Xs, f, 'k-', label='Truth')
# Plot GLM SGD Regressor
pl.plot(Xs, Eys, 'b-', label='GLM mean.')
pl.fill_between(Xs, y95n, y95x, facecolor='none',
edgecolor='b', linestyle='--', label=None)
pl.legend()
pl.grid(True)
pl.title('Regression demo')
pl.ylabel('y')
pl.xlabel('x')
pl.show()
```

This script will output something like the following,

## Large-scale Learning with Stochastic Gradients¶

By default the GLM uses stochastic gradients to learn all of its parameters and
hyperparameters and does not require any matrix inversion, and so it can be
used to learn from large datasets with lots of features (`slm.learn`

uses
L-BFGS and requires a matrix inversion). We can also use the GLM to approximate
and scale up regular Bayesian linear regression. For instance, if we modify the
Bayesian linear regression example from before,

```
...
from revrand import likelihoods
...
# Set up the likelihood of the GLM
llhood = likelihoods.Gaussian(var=Parameter(gamma(1.), Positive()))
# Learn regression parameters and predict
glm = GeneralizedLinearModel(llhood, basis)
glm.fit(X, y)
Ey_g, Vf_g = glm.predict_moments(Xtest)
...
# Plot GLM SGD Regressor
Vy_g = Vf_g + glm.like_hypers_
Sy_g = np.sqrt(Vy_g)
pl.plot(Xpl_s, Ey_g, 'm-', label='GLM')
pl.fill_between(Xs, Ey_g - 2 * Sy_g, Ey_g + 2 * Sy_g, facecolor='none',
edgecolor='m', linestyle='--', label=None)
...
```

This script will output something like the following,

We can see the approximation from the GLM is pretty good - this is because it uses a mixture of diagonal Gaussians posterior (thereby avoiding a full matrix inversion) to approximate the full Gaussian posterior covariance over the weights. This also has the advantage of allowing the model to learn multi-modal posterior distributions when non-Gaussian likelihoods are required.

## Feature Composition Framework¶

We have implemented an easy to use and extensible feature-building framework within revrand (Basis Functions) that mirrors many kernel composition frameworks, such as those found in Scikit Learn and GPflow. You have already seen the basics demonstrated in the above examples, i.e. concatenation of basis functions,

```
>>> X = np.random.randn(100, 5)
>>> N, d = X.shape
>>> base = LinearBasis(onescol=True) + RandomRBF(Xdim=d, nbases=100)
>>> Phi = base.transform(X)
>>> Phi.shape
(100, 206)
```

There are a few things at work in this example:

- Both
`LinearBasis`

and`RandomRBF`

are applied to all of`X`

, and the result is concatenated. `LinearBasis`

has pre-pended a column of ones onto`X`

so a subsequent algorithm can learn a “bias” term.`RandomRBF`

is actually approximating a radial basis*kernel*function, [3]! This also outputs`2 * nbases`

number of basis functions.- Hence the resulting basis function has a shape of
`(N, d + 1 + 2 * nbases)`

.

The above basis concatenation when used with the `StandardLinearModel`

is
actually very similar to the following kernel composition when used with Scikit
Learn’s `GaussianProcessRegressor`

,

```
>>> from sklearn.gaussian_process.kernels import RBF, DotProduct, \
... WhiteKernel
>>> kern = 1**2 * DotProduct() + 1**2 * RBF() + WhiteKernel()
```

Here the Scikit Learn GP learns the kernel amplitude parameters, which are initialised at 1. Revrand also learns these parameters, and the same initialization in revrand would be,

```
>>> from revrand import Parameter, Positive
>>> reg1 = Parameter(1., Positive())
>>> reg2 = Parameter(1., Positive()))
>>> base = LinearBasis(onescol=True, regularizer=reg1) \
... + RandomRBF(Xdim=d, nbases=100, regularizer=reg2)
```

This is because the *regularizer* on the weights (the variance of the weight
prior) is equivalent to the kernel amplitude parameters in a Gaussian process.
Hence, in revrand we allow a separate regularizers per basis object.

We can also use *partial application* of basis functions, e.g.

```
>>> base = LinearBasis(onescol=True, apply_ind=slice(0, 2)) \
+ RandomRBF(Xdim=d, nbases=100, apply_ind=slice(2, 5))
>>> Phi = base.transform(X)
>>> Phi.shape
(100, 203)
```

Now the basis functions are applied to separate dimensions of the input, `X`

.
That is, `LinearBasis`

takes dimensions 0 and 1, and `RandomRBF`

takes the
rest, and again the results are concatenated.

Note

For the moment we only support basis concatenation which is analogous to kernel addition. Performing kernel multiplication with bases is slightly more complex and we have not yet implemented it.

Finally, if we use these basis functions with any of the algorithms in this
revrand, *the parameters of the basis functions are learned*.

A note on implementation

We decided against using the kernel product notation for basis functions in
revrand (`1**2 * Kernel()`

), as they are not equivalent when a call to
`transform`

is made. That is, applying

```
>>> Phi = base.transform(X)
```

does not involve the regularizer in any way, which is why we set it in the
constructor of the basis. It would only be involved when constructing an
*equivalent* kernel,

```
>>> K = Phi.dot(L).dot(Phi.T)
```

where `L`

is a diagonal regularizer matrix corresponding to our bases’
`regularizer`

initializers. See [1] and our report for
more information on this.

## References¶

[1] | Yang, Z., Smola, A. J., Song, L., & Wilson, A. G. “A la Carte – Learning Fast Kernels”. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp. 1098-1106, 2015. |

[2] | Le, Q., Sarlos, T., & Smola, A. “Fastfood-approximating kernel expansions in loglinear time.” Proceedings of the international conference on machine learning. 2013. |

[3] | Rahimi, A., & Recht, B. “Random features for large-scale kernel machines.” Advances in neural information processing systems. 2007. |

[4] | Gershman, S., Hoffman, M., & Blei, D. “Nonparametric variational inference”. Proceedings of the international conference on machine learning. 2012. |

[5] | Kingma, D. P., & Welling, M. “Auto-encoding variational Bayes”. Proceedings of the 2nd International Conference on Learning Representations (ICLR). 2014. |