Optimization¶
This module contains several stochastic gradient optimizers and decorators for enhancing the functionality of these optimizers as well as the scipy.optimize.minimize optimizers.
Optimizer Decorators¶
Optimizer Decorators.
-
revrand.optimize.decorators.
flatten_args
(shapes)¶ Decorator to flatten structured arguments to a function.
Examples
>>> @flatten_args([(5,), ()]) ... def f(w, lambda_): ... return .5 * lambda_ * w.T.dot(w) >>> np.isclose(f(np.array([2., .5, .6, -.2, .9, .2])), .546) True >>> w = np.array([2., .5, .6, -.2, .9]) >>> lambda_ = .2 >>> np.isclose(.5 * lambda_ * w.T.dot(w), .546) True
Some other curious applications
>>> from operator import mul >>> flatten_args_dec = flatten_args([(), (3,)]) >>> func = flatten_args_dec(mul) >>> func(np.array([3.1, .6, 1.71, -1.2])) array([ 1.86 , 5.301, -3.72 ]) >>> 3.1 * np.array([.6, 1.71, -1.2]) array([ 1.86 , 5.301, -3.72 ]) >>> flatten_args_dec = flatten_args([(9,), (15,)]) >>> func = flatten_args_dec(np.meshgrid) >>> x, y = func(np.arange(-5, 7, .5)) # 7 - (-5) / 0.5 = 24 = 9 + 15 >>> x.shape (15, 9) >>> x[0, :] array([-5. , -4.5, -4. , -3.5, -3. , -2.5, -2. , -1.5, -1. ]) >>> y.shape (15, 9) >>> y[:, 0] array([-0.5, 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5])
-
revrand.optimize.decorators.
flatten_func_grad
(func)¶ Decorator to flatten structured gradients and return objective.
Examples
>>> def cost(w, lambda_): ... sq_norm = w.T.dot(w) ... return .5 * lambda_ * sq_norm, [lambda_ * w, .5 * sq_norm] >>> val, grad = cost(np.array([.5, .1, -.2]), .25)
>>> np.isclose(val, 0.0375) True
>>> len(grad) 2 >>> grad_w, grad_lambda = grad >>> np.shape(grad_w) (3,) >>> np.shape(grad_lambda) () >>> grad_w array([ 0.125, 0.025, -0.05 ]) >>> np.isclose(grad_lambda, 0.15) True
>>> cost_new = flatten_func_grad(cost) >>> val_new, grad_new = cost_new(np.array([.5, .1, -.2]), .25) >>> val == val_new True >>> grad_new array([ 0.125, 0.025, -0.05 , 0.15 ])
-
revrand.optimize.decorators.
flatten_grad
(func)¶ Decorator to flatten structured gradients.
Examples
>>> def cost(w, lambda_): ... sq_norm = w.T.dot(w) ... return lambda_ * w, .5 * sq_norm >>> grad = cost(np.array([.5, .1, -.2]), .25)
>>> len(grad) 2 >>> grad_w, grad_lambda = grad >>> np.shape(grad_w) (3,) >>> np.shape(grad_lambda) () >>> grad_w array([ 0.125, 0.025, -0.05 ]) >>> np.isclose(grad_lambda, 0.15) True
>>> cost_new = flatten_grad(cost) >>> grad_new = cost_new(np.array([.5, .1, -.2]), .25) >>> grad_new array([ 0.125, 0.025, -0.05 , 0.15 ])
-
revrand.optimize.decorators.
logtrick_minimizer
(minimizer)¶ Log-Trick decorator for optimizers.
This decorator implements the “log trick” for optimizing positive bounded variables. It will apply this trick for any variables that correspond to a Positive() bound.
Examples
>>> from scipy.optimize import minimize as sp_min >>> from ..btypes import Bound, Positive
Here is an example where we may want to enforce a particular parameter or parameters to be strictly greater than zero,
>>> def cost(w, lambda_): ... sq_norm = w.T.dot(w) ... return .5 * lambda_ * sq_norm, lambda_ * w
Now let’s enforce that the w are positive,
>>> bounds = [Positive(), Positive(), Positive()] >>> new_min = logtrick_minimizer(sp_min)
Initial values
>>> w_0 = np.array([.5, .1, .2]) >>> lambda_0 = .25
>>> res = new_min(cost, w_0, args=(lambda_0,), bounds=bounds, ... method='L-BFGS-B', jac=True) >>> res.x >= 0 array([ True, True, True], dtype=bool)
Note
This decorator only works on unstructured optimizers. However, it can be use with structured_minimizer, so long as it is the inner wrapper.
-
revrand.optimize.decorators.
logtrick_sgd
(sgd)¶ Log-Trick decorator for stochastic gradients.
This decorator implements the “log trick” for optimizing positive bounded variables using SGD. It will apply this trick for any variables that correspond to a Positive() bound.
Examples
>>> from ..optimize import sgd >>> from ..btypes import Bound, Positive
Here is an example where we may want to enforce a particular parameter or parameters to be strictly greater than zero,
>>> def cost(w, data, lambda_): ... N = len(data) ... y, X = data[:, 0], data[:, 1:] ... y_est = X.dot(w) ... ww = w.T.dot(w) ... obj = (y - y_est).sum() / N + lambda_ * ww ... gradw = - 2 * X.T.dot(y - y_est) / N + 2 * lambda_ * w ... return obj, gradw
Now let’s enforce that the w are positive,
>>> bounds = [Positive(), Positive()] >>> new_sgd = logtrick_sgd(sgd)
Data
>>> y = np.linspace(1, 10, 100) + np.random.randn(100) + 1 >>> X = np.array([np.ones(100), np.linspace(1, 100, 100)]).T >>> data = np.hstack((y[:, np.newaxis], X))
Initial values
>>> w_0 = np.array([1., 1.]) >>> lambda_0 = .25
>>> res = new_sgd(cost, w_0, data, args=(lambda_0,), bounds=bounds, ... batch_size=10, eval_obj=True) >>> res.x >= 0 array([ True, True], dtype=bool)
Note
This decorator only works on unstructured optimizers. However, it can be use with structured_minimizer, so long as it is the inner wrapper.
-
revrand.optimize.decorators.
structured_minimizer
(minimizer)¶ Allow an optimizer to accept nested sequences of Parameters to optimize.
This decorator can intepret the
Parameter
objects in btypes.py, and can accept nested sequences of any structure of these objects to optimise!It can also optionally evaluate random starts (i.e. random starting candidates) if the parameter objects have been initialised with distributions. For this, two additional parameters are exposed in the
minimizer
interface.Parameters: - fun (callable) – objective function that takes in arbitrary ndarrays, floats or nested sequences of these.
- parameters ((nested) sequences of Parameter objects) – Initial guess of the parameters of the objective function
- nstarts (int, optional) – The number random starting candidates for optimisation to evaluate.
This will only happen for
nstarts > 0
and if at least oneParameter
object is random. - random_state (None, int or RandomState, optional) – random seed
Examples
>>> from scipy.optimize import minimize as sp_min >>> from ..btypes import Parameter, Bound
Define a cost function that returns a pair. The first element is the cost value and the second element is the gradient represented by a tuple. Even if the cost is a function of a single variable, the gradient must be a tuple containing one element.
>>> def cost(w, lambda_): ... sq_norm = w.T.dot(w) ... return .5 * lambda_ * sq_norm, (lambda_ * w, .5 * sq_norm)
Augment the Scipy optimizer to take structured inputs
>>> new_min = structured_minimizer(sp_min)
Constant Initial values
>>> w_0 = Parameter(np.array([.5, .1, .2]), Bound()) >>> lambda_0 = Parameter(.25, Bound())
>>> res = new_min(cost, (w_0, lambda_0), method='L-BFGS-B', jac=True) >>> res_w, res_lambda = res.x
Random Initial values
>>> from scipy.stats import norm, gamma >>> w_0 = Parameter(norm(), Bound(), shape=(3,)) >>> lambda_0 = Parameter(gamma(a=1), Bound())
>>> res = new_min(cost, (w_0, lambda_0), method='L-BFGS-B', jac=True, ... nstarts=100, random_state=None) >>> res_w, res_lambda = res.x
-
revrand.optimize.decorators.
structured_sgd
(sgd)¶ Allow an SGD to accept nested sequences of Parameters to optimize.
This decorator can intepret the
Parameter
objects in btypes.py, and can accept nested sequences of any structure of these objects to optimise!It can also optionally evaluate random starts (i.e. random starting candidates) if the parameter objects have been initialised with distributions. For this, an additional parameter is exposed in the
minimizer
interface.Parameters: - fun (callable) – objective function that takes in arbitrary ndarrays, floats or nested sequences of these.
- parameters ((nested) sequences of Parameter objects) – Initial guess of the parameters of the objective function
- nstarts (int, optional) – The number random starting candidates for optimisation to evaluate.
This will only happen for
nstarts > 0
and if at least oneParameter
object is random.
Examples
>>> from ..optimize import sgd >>> from ..btypes import Parameter, Bound
Define a cost function that returns a pair. The first element is the cost value and the second element is the gradient represented by a sequence. Even if the cost is a function of a single variable, the gradient must be a sequence containing one element.
>>> def cost(w, lambda_, data): ... N = len(data) ... y, X = data[:, 0], data[:, 1:] ... y_est = X.dot(w) ... ww = w.T.dot(w) ... obj = (y - y_est).sum() / N + lambda_ * ww ... gradw = - 2 * X.T.dot(y - y_est) / N + 2 * lambda_ * w ... gradl = ww ... return obj, [gradw, gradl]
Augment the SGD optimizer to take structured inputs
>>> new_sgd = structured_sgd(sgd)
Data
>>> y = np.linspace(1, 10, 100) + np.random.randn(100) + 1 >>> X = np.array([np.ones(100), np.linspace(1, 100, 100)]).T >>> data = np.hstack((y[:, np.newaxis], X))
Constant Initial values
>>> w_0 = Parameter(np.array([1., 1.]), Bound()) >>> lambda_0 = Parameter(.25, Bound())
>>> res = new_sgd(cost, [w_0, lambda_0], data, batch_size=10, ... eval_obj=True) >>> res_w, res_lambda = res.x
Random Initial values
>>> from scipy.stats import norm, gamma >>> w_0 = Parameter(norm(), Bound(), shape=(2,)) >>> lambda_0 = Parameter(gamma(1.), Bound())
>>> res = new_sgd(cost, [w_0, lambda_0], data, batch_size=10, ... eval_obj=True, nstarts=100) >>> res_w, res_lambda = res.x
Stochastic Gradients¶
Stochastic Gradient Descent.
-
class
revrand.optimize.sgd.
AdaDelta
(rho=0.1, epsilon=1e-05)¶ AdaDelta Algorithm.
Parameters: - rho (float, optional) – smoothing/decay rate parameter, must be [0, 1].
- epsilon (float, optional) – “jitter” term to ensure continued learning (should be small).
-
reset
()¶ Reset the state of this updater for a new optimisation problem.
-
class
revrand.optimize.sgd.
AdaGrad
(eta=1, epsilon=1e-06)¶ AdaGrad Algorithm.
Parameters: - eta (float, optional) – smoothing/decay rate parameter, must be [0, 1].
- epsilon (float, optional) – small constant term to prevent divide-by-zeros
-
reset
()¶ Reset the state of this updater for a new optimisation problem.
-
class
revrand.optimize.sgd.
Adam
(alpha=0.01, beta1=0.9, beta2=0.99, epsilon=1e-08)¶ Adam Algorithm.
Parameters: - alpha (float, optional) – stepsize to give the update.
- beta1 (float, optional) – smoothing/decay rate parameter for the gradient, must be [0, 1].
- beta2 (float, optional) – smoothing/decay rate parameter for the squared gradient, must be [0, 1].
- epsilon (float, optional) – “jitter” term to ensure continued learning (should be small).
-
reset
()¶ Reset the state of this updater for a new optimisation problem.
-
class
revrand.optimize.sgd.
Momentum
(rho=0.5, eta=0.01)¶ Momentum Algorithm.
Parameters: - rho (float, optional) – smoothing/decay rate parameter, must be [0, 1].
- eta (float, optional) – weight to give to the momentum term
-
reset
()¶ Reset the state of this updater for a new optimisation problem.
-
class
revrand.optimize.sgd.
SGDUpdater
(eta=0.1)¶ Base class for SGD learning rate algorithms.
Parameters: eta (float, optional) – The learning rate applied to every gradient step. -
reset
()¶ Reset the state of this updater for a new optimisation problem.
-
-
revrand.optimize.sgd.
gen_batch
(data, batch_size, maxiter=inf, random_state=None)¶ Create random batches for Stochastic gradients.
Batch index generator for SGD that will yeild random batches for a a defined number of iterations, which can be infinite. This generator makes consecutive passes through the data, drawing without replacement on each pass.
Parameters: - data (ndarray or sequence of ndarrays) – The data, can be a matrix X, (X,y) tuples etc
- batch_size (int) – number of data points in each batch.
- maxiter (int, optional) – The number of iterations
- random_state (int or RandomState, optional) – random seed
Yields: ndarray or sequence – with each array length
batch_size
, i.e. a subset of data.
-
revrand.optimize.sgd.
normalize_bound
(bound)¶ Replace
None
with + or - inf in bound tuples.Examples
>>> normalize_bound((2.6, 7.2)) (2.6, 7.2)
>>> normalize_bound((None, 7.2)) (-inf, 7.2)
>>> normalize_bound((2.6, None)) (2.6, inf)
>>> normalize_bound((None, None)) (-inf, inf)
This operation is idempotent:
>>> normalize_bound((-float("inf"), float("inf"))) (-inf, inf)
-
revrand.optimize.sgd.
sgd
(fun, x0, data, args=(), bounds=None, batch_size=10, maxiter=5000, updater=None, eval_obj=False, random_state=None)¶ Stochastic Gradient Descent.
Parameters: - fun (callable) – the function to minimize, this must have the signature
[obj,]
grad = fun(x, data, ...)`, where theeval_obj
argument tellssgd
if an objective function value is going to be returned byfun
. - x0 (ndarray) – a sequence/1D array of initial values for the parameters to learn.
- data (ndarray) – a numpy array or sequence of data to input into
fun
. This will be split along the first axis (axis=0), and then input intofun
. - args (sequence, optional) – an optional sequence of arguments to give to fun.
- bounds (sequence, optional) – Bounds for variables, (min, max) pairs for each element in x, defining the bounds on that parameter. Use None for one of min or max when there is no bound in that direction.
- batch_size (int, optional) – The number of observations in an SGD batch.
- maxiter (int, optional) – Number of mini-batch iterations before optimization terminates.
- updater (SGDUpdater, optional) – The type of gradient update to use, by default this is Adam.
- eval_obj (bool, optional) – This indicates whether or not
fun
also evaluates and returns the objective function value. If this is true,fun
must return(obj, grad)
and then a list of objective function values is also returned. - random_state (int or RandomState, optional) – random seed
Returns: res –
- x : narray
the final result
- norms : list
the list of gradient norms
- message : str
the convergence condition (‘maxiter reached’ or error)
- objs : list
the list of objective function evaluations if
eval_obj
is True.- fun : float
the final objective function evaluation if
eval_obj
is True.
Return type: OptimizeResult
- fun (callable) – the function to minimize, this must have the signature