Optimization

This module contains several stochastic gradient optimizers and decorators for enhancing the functionality of these optimizers as well as the scipy.optimize.minimize optimizers.

Optimizer Decorators

Optimizer Decorators.

revrand.optimize.decorators.flatten_args(shapes)

Decorator to flatten structured arguments to a function.

Examples

>>> @flatten_args([(5,), ()])
... def f(w, lambda_):
...     return .5 * lambda_ * w.T.dot(w)
>>> np.isclose(f(np.array([2., .5, .6, -.2, .9, .2])), .546)
True
>>> w = np.array([2., .5, .6, -.2, .9])
>>> lambda_ = .2
>>> np.isclose(.5 * lambda_ * w.T.dot(w), .546)
True

Some other curious applications

>>> from operator import mul
>>> flatten_args_dec = flatten_args([(), (3,)])
>>> func = flatten_args_dec(mul)
>>> func(np.array([3.1, .6, 1.71, -1.2]))
array([ 1.86 ,  5.301, -3.72 ])
>>> 3.1 * np.array([.6, 1.71, -1.2])
array([ 1.86 ,  5.301, -3.72 ])
>>> flatten_args_dec = flatten_args([(9,), (15,)])
>>> func = flatten_args_dec(np.meshgrid)
>>> x, y = func(np.arange(-5, 7, .5)) # 7 - (-5) / 0.5 = 24 = 9 + 15
>>> x.shape
(15, 9)
>>> x[0, :]
array([-5. , -4.5, -4. , -3.5, -3. , -2.5, -2. , -1.5, -1. ])
>>> y.shape
(15, 9)
>>> y[:, 0]
array([-0.5,  0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,
        5. ,  5.5,  6. ,  6.5])
revrand.optimize.decorators.flatten_func_grad(func)

Decorator to flatten structured gradients and return objective.

Examples

>>> def cost(w, lambda_):
...     sq_norm = w.T.dot(w)
...     return .5 * lambda_ * sq_norm, [lambda_ * w, .5 * sq_norm]
>>> val, grad = cost(np.array([.5, .1, -.2]), .25)
>>> np.isclose(val, 0.0375)
True
>>> len(grad)
2
>>> grad_w, grad_lambda = grad
>>> np.shape(grad_w)
(3,)
>>> np.shape(grad_lambda)
()
>>> grad_w
array([ 0.125,  0.025, -0.05 ])
>>> np.isclose(grad_lambda, 0.15)
True
>>> cost_new = flatten_func_grad(cost)
>>> val_new, grad_new = cost_new(np.array([.5, .1, -.2]), .25)
>>> val == val_new
True
>>> grad_new
array([ 0.125,  0.025, -0.05 ,  0.15 ])
revrand.optimize.decorators.flatten_grad(func)

Decorator to flatten structured gradients.

Examples

>>> def cost(w, lambda_):
...     sq_norm = w.T.dot(w)
...     return lambda_ * w, .5 * sq_norm
>>> grad = cost(np.array([.5, .1, -.2]), .25)
>>> len(grad)
2
>>> grad_w, grad_lambda = grad
>>> np.shape(grad_w)
(3,)
>>> np.shape(grad_lambda)
()
>>> grad_w
array([ 0.125,  0.025, -0.05 ])
>>> np.isclose(grad_lambda, 0.15)
True
>>> cost_new = flatten_grad(cost)
>>> grad_new = cost_new(np.array([.5, .1, -.2]), .25)
>>> grad_new
array([ 0.125,  0.025, -0.05 ,  0.15 ])
revrand.optimize.decorators.logtrick_minimizer(minimizer)

Log-Trick decorator for optimizers.

This decorator implements the “log trick” for optimizing positive bounded variables. It will apply this trick for any variables that correspond to a Positive() bound.

Examples

>>> from scipy.optimize import minimize as sp_min
>>> from ..btypes import Bound, Positive

Here is an example where we may want to enforce a particular parameter or parameters to be strictly greater than zero,

>>> def cost(w, lambda_):
...     sq_norm = w.T.dot(w)
...     return .5 * lambda_ * sq_norm, lambda_ * w

Now let’s enforce that the w are positive,

>>> bounds = [Positive(), Positive(), Positive()]
>>> new_min = logtrick_minimizer(sp_min)

Initial values

>>> w_0 = np.array([.5, .1, .2])
>>> lambda_0 = .25
>>> res = new_min(cost, w_0, args=(lambda_0,), bounds=bounds,
...               method='L-BFGS-B', jac=True)
>>> res.x >= 0
array([ True,  True,  True], dtype=bool)

Note

This decorator only works on unstructured optimizers. However, it can be use with structured_minimizer, so long as it is the inner wrapper.

revrand.optimize.decorators.logtrick_sgd(sgd)

Log-Trick decorator for stochastic gradients.

This decorator implements the “log trick” for optimizing positive bounded variables using SGD. It will apply this trick for any variables that correspond to a Positive() bound.

Examples

>>> from ..optimize import sgd
>>> from ..btypes import Bound, Positive

Here is an example where we may want to enforce a particular parameter or parameters to be strictly greater than zero,

>>> def cost(w, data, lambda_):
...     N = len(data)
...     y, X = data[:, 0], data[:, 1:]
...     y_est = X.dot(w)
...     ww = w.T.dot(w)
...     obj = (y - y_est).sum() / N + lambda_ * ww
...     gradw = - 2 * X.T.dot(y - y_est) / N + 2 * lambda_ * w
...     return obj, gradw

Now let’s enforce that the w are positive,

>>> bounds = [Positive(), Positive()]
>>> new_sgd = logtrick_sgd(sgd)

Data

>>> y = np.linspace(1, 10, 100) + np.random.randn(100) + 1
>>> X = np.array([np.ones(100), np.linspace(1, 100, 100)]).T
>>> data = np.hstack((y[:, np.newaxis], X))

Initial values

>>> w_0 = np.array([1., 1.])
>>> lambda_0 = .25
>>> res = new_sgd(cost, w_0, data, args=(lambda_0,), bounds=bounds,
...               batch_size=10, eval_obj=True)
>>> res.x >= 0
array([ True,  True], dtype=bool)

Note

This decorator only works on unstructured optimizers. However, it can be use with structured_minimizer, so long as it is the inner wrapper.

revrand.optimize.decorators.structured_minimizer(minimizer)

Allow an optimizer to accept nested sequences of Parameters to optimize.

This decorator can intepret the Parameter objects in btypes.py, and can accept nested sequences of any structure of these objects to optimise!

It can also optionally evaluate random starts (i.e. random starting candidates) if the parameter objects have been initialised with distributions. For this, two additional parameters are exposed in the minimizer interface.

Parameters:
  • fun (callable) – objective function that takes in arbitrary ndarrays, floats or nested sequences of these.
  • parameters ((nested) sequences of Parameter objects) – Initial guess of the parameters of the objective function
  • nstarts (int, optional) – The number random starting candidates for optimisation to evaluate. This will only happen for nstarts > 0 and if at least one Parameter object is random.
  • random_state (None, int or RandomState, optional) – random seed

Examples

>>> from scipy.optimize import minimize as sp_min
>>> from ..btypes import Parameter, Bound

Define a cost function that returns a pair. The first element is the cost value and the second element is the gradient represented by a tuple. Even if the cost is a function of a single variable, the gradient must be a tuple containing one element.

>>> def cost(w, lambda_):
...     sq_norm = w.T.dot(w)
...     return .5 * lambda_ * sq_norm, (lambda_ * w, .5 * sq_norm)

Augment the Scipy optimizer to take structured inputs

>>> new_min = structured_minimizer(sp_min)

Constant Initial values

>>> w_0 = Parameter(np.array([.5, .1, .2]), Bound())
>>> lambda_0 = Parameter(.25, Bound())
>>> res = new_min(cost, (w_0, lambda_0), method='L-BFGS-B', jac=True)
>>> res_w, res_lambda = res.x

Random Initial values

>>> from scipy.stats import norm, gamma
>>> w_0 = Parameter(norm(), Bound(), shape=(3,))
>>> lambda_0 = Parameter(gamma(a=1), Bound())
>>> res = new_min(cost, (w_0, lambda_0), method='L-BFGS-B', jac=True,
...               nstarts=100, random_state=None)
>>> res_w, res_lambda = res.x
revrand.optimize.decorators.structured_sgd(sgd)

Allow an SGD to accept nested sequences of Parameters to optimize.

This decorator can intepret the Parameter objects in btypes.py, and can accept nested sequences of any structure of these objects to optimise!

It can also optionally evaluate random starts (i.e. random starting candidates) if the parameter objects have been initialised with distributions. For this, an additional parameter is exposed in the minimizer interface.

Parameters:
  • fun (callable) – objective function that takes in arbitrary ndarrays, floats or nested sequences of these.
  • parameters ((nested) sequences of Parameter objects) – Initial guess of the parameters of the objective function
  • nstarts (int, optional) – The number random starting candidates for optimisation to evaluate. This will only happen for nstarts > 0 and if at least one Parameter object is random.

Examples

>>> from ..optimize import sgd
>>> from ..btypes import Parameter, Bound

Define a cost function that returns a pair. The first element is the cost value and the second element is the gradient represented by a sequence. Even if the cost is a function of a single variable, the gradient must be a sequence containing one element.

>>> def cost(w, lambda_, data):
...     N = len(data)
...     y, X = data[:, 0], data[:, 1:]
...     y_est = X.dot(w)
...     ww = w.T.dot(w)
...     obj = (y - y_est).sum() / N + lambda_ * ww
...     gradw = - 2 * X.T.dot(y - y_est) / N + 2 * lambda_ * w
...     gradl = ww
...     return obj, [gradw, gradl]

Augment the SGD optimizer to take structured inputs

>>> new_sgd = structured_sgd(sgd)

Data

>>> y = np.linspace(1, 10, 100) + np.random.randn(100) + 1
>>> X = np.array([np.ones(100), np.linspace(1, 100, 100)]).T
>>> data = np.hstack((y[:, np.newaxis], X))

Constant Initial values

>>> w_0 = Parameter(np.array([1., 1.]), Bound())
>>> lambda_0 = Parameter(.25, Bound())
>>> res = new_sgd(cost, [w_0, lambda_0], data, batch_size=10,
...               eval_obj=True)
>>> res_w, res_lambda = res.x

Random Initial values

>>> from scipy.stats import norm, gamma
>>> w_0 = Parameter(norm(), Bound(), shape=(2,))
>>> lambda_0 = Parameter(gamma(1.), Bound())
>>> res = new_sgd(cost, [w_0, lambda_0], data, batch_size=10,
...               eval_obj=True, nstarts=100)
>>> res_w, res_lambda = res.x

Stochastic Gradients

Stochastic Gradient Descent.

class revrand.optimize.sgd.AdaDelta(rho=0.1, epsilon=1e-05)

AdaDelta Algorithm.

Parameters:
  • rho (float, optional) – smoothing/decay rate parameter, must be [0, 1].
  • epsilon (float, optional) – “jitter” term to ensure continued learning (should be small).
reset()

Reset the state of this updater for a new optimisation problem.

class revrand.optimize.sgd.AdaGrad(eta=1, epsilon=1e-06)

AdaGrad Algorithm.

Parameters:
  • eta (float, optional) – smoothing/decay rate parameter, must be [0, 1].
  • epsilon (float, optional) – small constant term to prevent divide-by-zeros
reset()

Reset the state of this updater for a new optimisation problem.

class revrand.optimize.sgd.Adam(alpha=0.01, beta1=0.9, beta2=0.99, epsilon=1e-08)

Adam Algorithm.

Parameters:
  • alpha (float, optional) – stepsize to give the update.
  • beta1 (float, optional) – smoothing/decay rate parameter for the gradient, must be [0, 1].
  • beta2 (float, optional) – smoothing/decay rate parameter for the squared gradient, must be [0, 1].
  • epsilon (float, optional) – “jitter” term to ensure continued learning (should be small).
reset()

Reset the state of this updater for a new optimisation problem.

class revrand.optimize.sgd.Momentum(rho=0.5, eta=0.01)

Momentum Algorithm.

Parameters:
  • rho (float, optional) – smoothing/decay rate parameter, must be [0, 1].
  • eta (float, optional) – weight to give to the momentum term
reset()

Reset the state of this updater for a new optimisation problem.

class revrand.optimize.sgd.SGDUpdater(eta=0.1)

Base class for SGD learning rate algorithms.

Parameters:eta (float, optional) – The learning rate applied to every gradient step.
reset()

Reset the state of this updater for a new optimisation problem.

revrand.optimize.sgd.gen_batch(data, batch_size, maxiter=inf, random_state=None)

Create random batches for Stochastic gradients.

Batch index generator for SGD that will yeild random batches for a a defined number of iterations, which can be infinite. This generator makes consecutive passes through the data, drawing without replacement on each pass.

Parameters:
  • data (ndarray or sequence of ndarrays) – The data, can be a matrix X, (X,y) tuples etc
  • batch_size (int) – number of data points in each batch.
  • maxiter (int, optional) – The number of iterations
  • random_state (int or RandomState, optional) – random seed
Yields:

ndarray or sequence – with each array length batch_size, i.e. a subset of data.

revrand.optimize.sgd.normalize_bound(bound)

Replace None with + or - inf in bound tuples.

Examples

>>> normalize_bound((2.6, 7.2))
(2.6, 7.2)
>>> normalize_bound((None, 7.2))
(-inf, 7.2)
>>> normalize_bound((2.6, None))
(2.6, inf)
>>> normalize_bound((None, None))
(-inf, inf)

This operation is idempotent:

>>> normalize_bound((-float("inf"), float("inf")))
(-inf, inf)
revrand.optimize.sgd.sgd(fun, x0, data, args=(), bounds=None, batch_size=10, maxiter=5000, updater=None, eval_obj=False, random_state=None)

Stochastic Gradient Descent.

Parameters:
  • fun (callable) – the function to minimize, this must have the signature [obj,] grad = fun(x, data, ...)`, where the eval_obj argument tells sgd if an objective function value is going to be returned by fun.
  • x0 (ndarray) – a sequence/1D array of initial values for the parameters to learn.
  • data (ndarray) – a numpy array or sequence of data to input into fun. This will be split along the first axis (axis=0), and then input into fun.
  • args (sequence, optional) – an optional sequence of arguments to give to fun.
  • bounds (sequence, optional) – Bounds for variables, (min, max) pairs for each element in x, defining the bounds on that parameter. Use None for one of min or max when there is no bound in that direction.
  • batch_size (int, optional) – The number of observations in an SGD batch.
  • maxiter (int, optional) – Number of mini-batch iterations before optimization terminates.
  • updater (SGDUpdater, optional) – The type of gradient update to use, by default this is Adam.
  • eval_obj (bool, optional) – This indicates whether or not fun also evaluates and returns the objective function value. If this is true, fun must return (obj, grad) and then a list of objective function values is also returned.
  • random_state (int or RandomState, optional) – random seed
Returns:

res

x : narray

the final result

norms : list

the list of gradient norms

message : str

the convergence condition (‘maxiter reached’ or error)

objs : list

the list of objective function evaluations if eval_obj is True.

fun : float

the final objective function evaluation if eval_obj is True.

Return type:

OptimizeResult