Utilities

General Utilities

Assortment of handy functions.

class revrand.utils.base.Bunch(**kwargs)

Container object for datasets.

Dictionary-like object that exposes its keys as attributes.

Examples

>>> b = Bunch(foo=42, bar=10)
>>> b == {'foo': 42, 'bar': 10}
True
>>> b.foo
42
>>> b.bar
10
>>> b['foo']
42
>>> b.baz = 61
>>> b.baz
61
>>> b['baz']
61
revrand.utils.base.atleast_list(a)

Promote an object to a list if not a list or generator.

Parameters:a (object) – any object you want to at least be a list with one element
Returns:untounched if a was a generator or list, otherwise [a].
Return type:list or generator

Examples

>>> a = 1.
>>> atleast_list(a)
[1.0]
>>> a = [1.]
>>> atleast_list(a)
[1.0]
revrand.utils.base.atleast_tuple(a)

Promote an object to a tuple if not a tuple or generator.

Parameters:a (object) – any object you want to at least be a tuple with one element
Returns:untounched if a was a generator or tuple, otherwise (a,).
Return type:tuple or generator

Examples

>>> a = 1.
>>> atleast_tuple(a)
(1.0,)
>>> a = (1.,)
>>> atleast_tuple(a)
(1.0,)
revrand.utils.base.couple(f, g)

Compose a function thate returns two arguments.

Given a pair of functions that take the same arguments, return a single function that returns a pair consisting of the return values of each function.

Notes

Equivalent to:

lambda f, g: lambda *args, **kwargs: (f(*args, **kwargs),
                                      g(*args, **kwargs))

Examples

>>> f = lambda x: 2*x**3
>>> df = lambda x: 6*x**2
>>> f_new = couple(f, df)
>>> f_new(5)
(250, 150)
revrand.utils.base.decouple(fn)

Inverse operation of couple.

Create two functions of one argument and one return from a function that takes two arguments and has two returns

Examples

>>> h = lambda x: (2*x**3, 6*x**2)
>>> f, g = decouple(h)
>>> f(5)
250
>>> g(5)
150
revrand.utils.base.flatten(arys, returns_shapes=True, hstack=<function hstack>, ravel=<function ravel>, shape=<function shape>)

Flatten a potentially recursive list of multidimensional objects.

Note

Not to be confused with np.ndarray.flatten() (a more befitting might be chain or stack or maybe something else entirely since this function is more than either concatenate or np.flatten itself. Rather, it is the composition of the former with the latter.

Parameters:
  • arys (list of objects) – One or more input arrays of possibly heterogenous shapes and sizes.
  • returns_shapes (bool, optional) – Default is True. If True, the tuple (flattened, shapes) is returned, otherwise only flattened is returned.
  • hstack (callable, optional) – a function that implements horizontal stacking
  • ravel (callable, optional) – a function that flattens the object
  • shape (callable, optional) – a function that returns the shape of the object
Returns:

flattened,[shapes] – Return the flat (1d) object resulting from the concatenation of flattened multidimensional objects. When returns_shapes is True, return a list of tuples containing also the shapes of each array as the second element.

Return type:

{1dobject, list of tuples}

See also

revrand.utils.unflatten()
its inverse

Examples

>>> a = 9
>>> b = np.array([4, 7, 4, 5, 2])
>>> c = np.array([[7, 3, 1],
...               [2, 6, 6]])
>>> d = np.array([[[6, 5, 5],
...                [1, 6, 9]],
...               [[3, 9, 1],
...                [9, 4, 1]]])
>>> flatten([a, b, c, d]) 
(array([9, 4, 7, 4, 5, 2, 7, 3, 1, 2, 6, 6, 6, 5, 5, 1, 6, 9, 3, 9,
        1, 9, 4, 1]), [(), (5,), (2, 3), (2, 2, 3)])

Note that scalars and 0-dimensional arrays are treated differently from 1-dimensional singleton arrays.

>>> flatten([3.14, np.array(2.71), np.array([1.61])])
... 
(array([ 3.14,  2.71,  1.61]), [(), (), (1,)])
>>> flatten([a, b, c, d], returns_shapes=False)
... 
array([9, 4, 7, 4, 5, 2, 7, 3, 1, 2, 6, 6, 6, 5, 5, 1, 6, 9, 3, 9,
       1, 9, 4, 1])
>>> w, x, y, z = unflatten(*flatten([a, b, c, d]))
>>> w == a
True
>>> np.array_equal(x, b)
True
>>> np.array_equal(y, c)
True
>>> np.array_equal(z, d)
True
>>> flatten([3.14, [np.array(2.71), np.array([1.61])]])
... 
(array([ 3.14,  2.71,  1.61]), [(), [(), (1,)]])
revrand.utils.base.issequence(obj)

Test if an object is an iterable generator, list or tuple.

Parameters:obj (object) – object to test
Returns:True if obj is a tuple, list or generator only.
Return type:bool

Examples

>>> issequence([1, 2])
True
>>> issequence((1,))
True
>>> issequence((i for i in range(8)))
True
>>> issequence(np.array([1, 2, 3]))
False
revrand.utils.base.map_indices(fn, iterable, indices)

Map a function across indices of an iterable.

Notes

Roughly equivalent to, though more efficient than:

lambda fn, iterable, *indices: (fn(arg) if i in indices else arg
                                for i, arg in enumerate(iterable))

Examples

>>> a = [4, 6, 7, 1, 6, 8, 2]
>>> from operator import mul
>>> list(map_indices(partial(mul, 3), a, [0, 3, 5]))
[12, 6, 7, 3, 6, 24, 2]
>>> b = [9., np.array([5., 6., 2.]),
...      np.array([[5., 6., 2.], [2., 3., 9.]])]
>>> list(map_indices(np.log, b, [0, 2])) 
[2.1972245773362196,
 array([ 5.,  6.,  2.]),
 array([[ 1.60943791,  1.79175947,  0.69314718],
        [ 0.69314718,  1.09861229,  2.19722458]])]

Todo

Floating point precision

>>> list(map_indices(np.exp, list(map_indices(np.log, b, [0, 2])), [0, 2]))
... 
[9.,
 array([5., 6., 2.]),
 array([[ 5.,  6.,  2.],
        [ 2.,  3.,  9.]])]
revrand.utils.base.map_recursive(fn, iterable, output_type=None)

Apply a function of a potentially nested list of lists.

Parameters:
  • fn (callable) – The function to apply to each element (and sub elements) in iterable
  • iterable (iterable) – An iterable, sequence, sequence of sequences etc. fn will be applied to each element in each list.
  • output_type (callable, optional) – if None, a map with sub-maps in the same structure as iterable will be returned, otherwise the callable will be applied to each sequence (i.e. list will return lists of lists etc).
Returns:

if output_type is None, a map with sub-maps in the same structure as iterable will be returned, otherwise the callable will be applied to each sequence (i.e. list will return lists of lists etc).

Return type:

map or iterable type

Examples

>>> seq = [1, 2, [3, 4, [5, 6]], 7]
>>> map_recursive(lambda x: x > 4, seq, output_type=list)
[False, False, [False, False, [True, True]], True]
>>> map_recursive(lambda x: 2 * x, seq, output_type=tuple)
(2, 4, (6, 8, (10, 12)), 14)
revrand.utils.base.nwise(iterable, n)

Sliding window iterator.

Iterator that acts like a sliding window of size n; slides over some iterable n items at a time. If iterable has m elements, this function will return an iterator over m-n+1 tuples.

Parameters:
  • iterable (iterable) – An iterable object.
  • n (int) – Window size.
Returns:

Iterator of size n tuples

Return type:

iterator of tuples.

Notes

First n iterators are created:

iters = tee(iterable, n)

Next, iterator i is advanced i times:

for i, it in enumerate(iters):
    for _ in range(i):
        next(it, None)

Finally, the iterators are zipped back up again:

return zip(*iters)

Examples

>>> a = [2, 5, 7, 4, 2, 8, 6]
>>> list(nwise(a, n=3))
[(2, 5, 7), (5, 7, 4), (7, 4, 2), (4, 2, 8), (2, 8, 6)]
>>> pairwise = partial(nwise, n=2)
>>> list(pairwise(a))
[(2, 5), (5, 7), (7, 4), (4, 2), (2, 8), (8, 6)]
>>> list(nwise(a, n=1))
[(2,), (5,), (7,), (4,), (2,), (8,), (6,)]
>>> list(nwise(a, n=7))
[(2, 5, 7, 4, 2, 8, 6)]

Todo

These should probably raise ValueError...

>>> list(nwise(a, 8))
[]
>>> list(nwise(a, 9))
[]

A sliding window of size n over a list of m elements gives m-n+1 windows

>>> len(a) - len(list(nwise(a, 2))) == 1
True
>>> len(a) - len(list(nwise(a, 3))) == 2
True
>>> len(a) - len(list(nwise(a, 7))) == 6
True
revrand.utils.base.scalar_reshape(a, newshape, order='C')

Reshape, but also return scalars or empty lists.

Identical to numpy.reshape except in the case where newshape is the empty tuple, in which case we return a scalar instead of a 0-dimensional array.

Examples

>>> a = np.arange(6)
>>> np.array_equal(np.reshape(a, (3, 2)), scalar_reshape(a, (3, 2)))
True
>>> scalar_reshape(np.array([3.14]), newshape=())
3.14
>>> scalar_reshape(np.array([2.71]), newshape=(1,))
array([ 2.71])
>>> scalar_reshape(np.array([]), newshape=(0,))
[]
revrand.utils.base.sumprod(seq)

Product of tuple, or sum of products of lists of tuples.

Parameters:seq (tuple or list) –
Returns:the product of input tuples, or the sum of products of lists of tuples, recursively.
Return type:int

Examples

>>> tup = (1, 2, 3)
>>> sumprod(tup)
6
>>> lis = [(1, 2, 3), (2, 2)]
>>> sumprod(lis)
10
>>> lis = [(1, 2, 3), [(2, 1), (3,)]]
>>> sumprod(lis)
11
revrand.utils.base.unflatten(ary, shapes, reshape=<function scalar_reshape>)

Inverse opertation of flatten.

Given a flat (1d) array, and a list of shapes (represented as tuples), return a list of ndarrays with the specified shapes.

Parameters:
  • ary (a 1d array) – A flat (1d) array.
  • shapes (list of tuples) – A list of ndarray shapes (tuple of array dimensions)
Returns:

A list of ndarrays with the specified shapes.

Return type:

list of ndarrays

See also

revrand.utils.flatten()
its inverse

Notes

Equivalent to:

lambda ary, shapes, order='C': \
    map(partial(custom_reshape, order=order),
        np.hsplit(ary, np.cumsum(map(partial(np.prod, dtype=int),
                                     shapes))), shapes)

Examples

>>> a = np.array([7, 4, 5, 8, 9, 1, 4, 2, 5, 3, 4, 3])
>>> list(unflatten(a, [(1,), (1,), (4,), (2, 3)]))
... 
[array([7]), array([4]), array([5, 8, 9, 1]), array([[4, 2, 5],
    [3, 4, 3]])]
>>> list(unflatten(a, [(), (1,), (4,), (2, 3)]))
... 
[7, array([4]), array([5, 8, 9, 1]), array([[4, 2, 5], [3, 4, 3]])]
>>> list(unflatten(a, [(), (1,), (3,), (2, 3)]))
... 
[7, array([4]), array([5, 8, 9]), array([[1, 4, 2], [5, 3, 4]])]
>>> list(unflatten(a, [(), (1,), (5,), (2, 3)]))
... 
Traceback (most recent call last):
    ...
ValueError: total size of new array must be unchanged
>>> flatten(list(unflatten(a, [(), (1,), (4,), (2, 3)])))
... 
(array([7, 4, 5, 8, 9, 1, 4, 2, 5, 3, 4, 3]),
    [(), (1,), (4,), (2, 3)])
>>> list(unflatten(a, [[(1,), (1,)], (4,), (2, 3)]))
... 
[[array([7]), array([4])], array([5, 8, 9, 1]), array([[4, 2, 5],
    [3, 4, 3]])]
>>> flatten(list(unflatten(a, [(), (1,), [(4,), (2, 3)]])))
... 
(array([7, 4, 5, 8, 9, 1, 4, 2, 5, 3, 4, 3]),
    [(), (1,), [(4,), (2, 3)]])

Dataset Utilities

Dataset loading utilities

Portions of this module derived from http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets

revrand.utils.datasets.fetch_gpml_sarcos_data(transpose_data=True, data_home=None)

Fetch the SARCOS dataset from the internet and parse appropriately into python arrays

>>> gpml_sarcos = fetch_gpml_sarcos_data()
>>> gpml_sarcos.train.data.shape
(44484, 21)
>>> gpml_sarcos.train.targets.shape
(44484,)
>>> gpml_sarcos.train.targets.round(2) 
array([ 50.29,  44.1 ,  37.35, ...,  22.7 ,  17.13,   6.52])
>>> gpml_sarcos.test.data.shape
(4449, 21)
>>> gpml_sarcos.test.targets.shape
(4449,)
revrand.utils.datasets.fetch_gpml_usps_resampled_data(transpose_data=True, data_home=None)

Fetch the USPS handwritten digits dataset from the internet and parse appropriately into python arrays

>>> usps_resampled = fetch_gpml_usps_resampled_data()
>>> usps_resampled.train.targets.shape
(4649,)
>>> usps_resampled.train.targets 
array([6, 0, 1, ..., 9, 2, 7])
>>> usps_resampled.train.data.shape
(4649, 256)
>>> np.all(-1 <= usps_resampled.train.data)
True
>>> np.all(usps_resampled.train.data < 1)
True
>>> usps_resampled.test.targets.shape
(4649,)
>>> usps_resampled.test.data.shape
(4649, 256)
>>> usps_resampled = fetch_gpml_usps_resampled_data(transpose_data=False)
>>> usps_resampled.train.data.shape
(256, 4649)
revrand.utils.datasets.gen_gausprocess_se(ntrain, ntest, noise=1.0, lenscale=1.0, scale=1.0, xmin=-10, xmax=10)

Generate a random (noisy) draw from a Gaussian Process with a RBF kernel.

revrand.utils.datasets.get_data_home(data_home=None)

Return the path of the revrand data dir.

This folder is used by some large dataset loaders to avoid downloading the data several times.

By default the data dir is set to a folder named ‘revrand_data’ in the user home folder.

Alternatively, it can be set by the ‘REVRAND_DATA’ environment variable or programmatically by giving an explicit folder path. The ‘~’ symbol is expanded to the user home folder.

If the folder does not already exist, it is automatically created.

revrand.utils.datasets.make_polynomial(degree=3, n_samples=100, bias=0.0, noise=0.0, return_coefs=False, random_state=None)

Generate a noisy polynomial for a regression problem

Examples

>>> X, y, coefs = make_polynomial(degree=3, n_samples=200, noise=.5,
...                               return_coefs=True, random_state=1)
revrand.utils.datasets.make_regression(func, n_samples=100, n_features=1, bias=0.0, noise=0.0, random_state=None)

Make dataset for a regression problem.

Examples

>>> f = lambda x: 0.5*x + np.sin(2*x)
>>> X, y = make_regression(f, bias=.5, noise=1., random_state=1)
>>> X.shape
(100, 1)
>>> y.shape
(100,)
>>> X[:5].round(2)
array([[ 1.62],
       [-0.61],
       [-0.53],
       [-1.07],
       [ 0.87]])
>>> y[:5].round(2)
array([ 0.76,  0.48, -0.23, -0.28,  0.83])

Decorators Utilities

Reusable decorators

class revrand.utils.decorators.Memoize(func)

Examples

>>> @Memoize
... def fib(n):
...     if n < 2:
...         return n
...     return fib(n-2) + fib(n-1)
>>> fib(10)
55
>>> isinstance(fib, dict)
True
>>> fib == {
...     (0,):  0,
...     (1,):  1,
...     (2,):  1,
...     (3,):  2,
...     (4,):  3,
...     (5,):  5,
...     (6,):  8,
...     (7,):  13,
...     (8,):  21,
...     (9,):  34,
...     (10,): 55,
... }
True

Order is not necessarily maintained.

>>> sorted(fib.keys())
[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,)]
>>> sorted(fib.values())
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
class revrand.utils.decorators.OrderedMemoize(func)

Examples

>>> @OrderedMemoize
... def fib(n):
...     if n < 2:
...         return n
...     return fib(n-2) + fib(n-1)
>>> fib(10)
55

The arguments and values are cached in the order they were called.

>>> list(fib.keys())
[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,)]
>>> list(fib.values())
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
>>> fib 
OrderedMemoize([((0,), 0),
                ((1,), 1),
                ((2,), 1),
                ((3,), 2),
                ((4,), 3),
                ((5,), 5),
                ((6,), 8),
                ((7,), 13),
                ((8,), 21),
                ((9,), 34),
                ((10,), 55)])
revrand.utils.decorators.flatten_args(fn)

Examples

>>> @flatten_args
... def f(x):
...     return 2*x
>>> x, y, z = f(np.array([1., 2.]), 3., np.array([[1., 2.],[.5, .9]]))
>>> x
array([ 2.,  4.])
>>> y
6.0
>>> z
array([[ 2. ,  4. ],
       [ 1. ,  1.8]])
revrand.utils.decorators.unvectorize_args(fn)

Examples

The Rosenbrock function is commonly used as a performance test problem for optimization algorithms. It and its derivatives are included in scipy.optimize and is implemented as expected by the family of optimization methods in scipy.optimize.

def rosen(x):
return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)

This representation makes it unwieldy to perform operations such as plotting since it is less straightforward to evaluate the function on a meshgrid. This decorator helps reconcile the differences between these representations.

>>> from scipy.optimize import rosen
>>> rosen(np.array([0.5, 1.5]))
156.5
>>> unvectorize_args(rosen)(0.5, 1.5)
... 
156.5

The rosen function is implemented in such a way that it generalizes to the Rosenbrock function of any number of variables. This decorator supports can support any functions defined in a similar manner.

The function with any number of arguments are well-defined:

>>> rosen(np.array([0.5, 1.5, 1., 0., 0.2]))
418.0
>>> unvectorize_args(rosen)(0.5, 1.5, 1., 0., 0.2)
... # can accept any variable number of arguments!
418.0

Make it easier to work with for other operations

>>> rosen_ = unvectorize_args(rosen)
>>> y, x = np.mgrid[0:2.1:0.05, -1:1.2:0.05]
>>> z = rosen_(x, y)
>>> z.round(2)
array([[ 104.  ,   85.25,   69.22, ...,  121.55,  146.42,  174.92],
       [  94.25,   76.48,   61.37, ...,  110.78,  134.57,  161.95],
       [  85.  ,   68.2 ,   54.02, ...,  100.5 ,  123.22,  149.47],
       ...,
       [  94.25,  113.53,  133.57, ...,   71.83,   54.77,   39.4 ],
       [ 104.  ,  124.25,  145.22, ...,   80.55,   62.42,   45.92],
       [ 114.25,  135.48,  157.37, ...,   89.78,   70.57,   52.95]])

Now this can be directly plotted with mpl_toolkits.mplot3d.Axes3D and ax.plot_surface.

revrand.utils.decorators.vectorize_args(fn)

When defining functions of several variables, it is usually more readable to write out each variable as a separate argument. This is also convenient for evaluating functions on a numpy.meshgrid.

However, the family of optimizers in scipy.optimize expects that all functions, including those of several variables, receive a single argument, which is a numpy.ndarray in the case of functions of several variables.

Readability counts. We need not compromise readability to conform to some interface when higher-order functions/decorators can abstract away the details for us. This is what this decorator does.

Examples

Optimizers such as those in scipy.optimize expects a function defined like this.

>>> def fun1(v):
...     # elliptic parabaloid
...     return 2*v[0]**2 + 2*v[1]**2 - 4
>>> a = np.array([2, 3])
>>> fun1(a)
22

Whereas this representation is not only more readable but more natural.

>>> def fun2(x, y):
...     # elliptic parabaloid
...     return 2*x**2 + 2*y**2 - 4
>>> fun2(2, 3)
22

It is also important for evaluating functions on a numpy.meshgrid

>>> y, x = np.mgrid[-5:5:0.2, -5:5:0.2]
>>> fun2(x, y)
array([[ 96.  ,  92.08,  88.32, ...,  84.72,  88.32,  92.08],
       [ 92.08,  88.16,  84.4 , ...,  80.8 ,  84.4 ,  88.16],
       [ 88.32,  84.4 ,  80.64, ...,  77.04,  80.64,  84.4 ],
       ...,
       [ 84.72,  80.8 ,  77.04, ...,  73.44,  77.04,  80.8 ],
       [ 88.32,  84.4 ,  80.64, ...,  77.04,  80.64,  84.4 ],
       [ 92.08,  88.16,  84.4 , ...,  80.8 ,  84.4 ,  88.16]])

We can easily reconcile the differences between these representation without having to compromise readability.

>>> fun1(a) == vectorize_args(fun2)(a)
True
>>> @vectorize_args
... def fun3(x, y):
...     # elliptic parabaloid
...     return 2*x**2 + 2*y**2 - 4
>>> fun1(a) == fun3(a)
True

Random Generators

revrand.utils.rand.endless_permutations(N, random_state=None)

Generate an endless sequence of random integers from permutations of the set [0, ..., N).

If we call this N times, we will sweep through the entire set without replacement, on the (N+1)th call a new permutation will be created, etc.

Parameters:
  • N (int) – the length of the set
  • random_state (int or RandomState, optional) – random seed
Yields:

int – a random int from the set [0, ..., N)