Utilities¶
General Utilities¶
Assortment of handy functions.
-
class
revrand.utils.base.
Bunch
(**kwargs)¶ Container object for datasets.
Dictionary-like object that exposes its keys as attributes.
Examples
>>> b = Bunch(foo=42, bar=10) >>> b == {'foo': 42, 'bar': 10} True >>> b.foo 42 >>> b.bar 10 >>> b['foo'] 42 >>> b.baz = 61 >>> b.baz 61 >>> b['baz'] 61
-
revrand.utils.base.
atleast_list
(a)¶ Promote an object to a list if not a list or generator.
Parameters: a (object) – any object you want to at least be a list with one element Returns: untounched if a
was a generator or list, otherwise[a]
.Return type: list or generator Examples
>>> a = 1. >>> atleast_list(a) [1.0] >>> a = [1.] >>> atleast_list(a) [1.0]
-
revrand.utils.base.
atleast_tuple
(a)¶ Promote an object to a tuple if not a tuple or generator.
Parameters: a (object) – any object you want to at least be a tuple with one element Returns: untounched if a
was a generator or tuple, otherwise(a,)
.Return type: tuple or generator Examples
>>> a = 1. >>> atleast_tuple(a) (1.0,) >>> a = (1.,) >>> atleast_tuple(a) (1.0,)
-
revrand.utils.base.
couple
(f, g)¶ Compose a function thate returns two arguments.
Given a pair of functions that take the same arguments, return a single function that returns a pair consisting of the return values of each function.
Notes
Equivalent to:
lambda f, g: lambda *args, **kwargs: (f(*args, **kwargs), g(*args, **kwargs))
Examples
>>> f = lambda x: 2*x**3 >>> df = lambda x: 6*x**2 >>> f_new = couple(f, df) >>> f_new(5) (250, 150)
-
revrand.utils.base.
decouple
(fn)¶ Inverse operation of couple.
Create two functions of one argument and one return from a function that takes two arguments and has two returns
Examples
>>> h = lambda x: (2*x**3, 6*x**2) >>> f, g = decouple(h)
>>> f(5) 250
>>> g(5) 150
-
revrand.utils.base.
flatten
(arys, returns_shapes=True, hstack=<function hstack>, ravel=<function ravel>, shape=<function shape>)¶ Flatten a potentially recursive list of multidimensional objects.
Note
Not to be confused with np.ndarray.flatten() (a more befitting might be chain or stack or maybe something else entirely since this function is more than either concatenate or np.flatten itself. Rather, it is the composition of the former with the latter.
Parameters: - arys (list of objects) – One or more input arrays of possibly heterogenous shapes and sizes.
- returns_shapes (bool, optional) – Default is True. If True, the tuple (flattened, shapes) is returned, otherwise only flattened is returned.
- hstack (callable, optional) – a function that implements horizontal stacking
- ravel (callable, optional) – a function that flattens the object
- shape (callable, optional) – a function that returns the shape of the object
Returns: flattened,[shapes] – Return the flat (1d) object resulting from the concatenation of flattened multidimensional objects. When returns_shapes is True, return a list of tuples containing also the shapes of each array as the second element.
Return type: {1dobject, list of tuples}
See also
revrand.utils.unflatten()
- its inverse
Examples
>>> a = 9 >>> b = np.array([4, 7, 4, 5, 2]) >>> c = np.array([[7, 3, 1], ... [2, 6, 6]]) >>> d = np.array([[[6, 5, 5], ... [1, 6, 9]], ... [[3, 9, 1], ... [9, 4, 1]]])
>>> flatten([a, b, c, d]) (array([9, 4, 7, 4, 5, 2, 7, 3, 1, 2, 6, 6, 6, 5, 5, 1, 6, 9, 3, 9, 1, 9, 4, 1]), [(), (5,), (2, 3), (2, 2, 3)])
Note that scalars and 0-dimensional arrays are treated differently from 1-dimensional singleton arrays.
>>> flatten([3.14, np.array(2.71), np.array([1.61])]) ... (array([ 3.14, 2.71, 1.61]), [(), (), (1,)])
>>> flatten([a, b, c, d], returns_shapes=False) ... array([9, 4, 7, 4, 5, 2, 7, 3, 1, 2, 6, 6, 6, 5, 5, 1, 6, 9, 3, 9, 1, 9, 4, 1])
>>> w, x, y, z = unflatten(*flatten([a, b, c, d]))
>>> w == a True
>>> np.array_equal(x, b) True
>>> np.array_equal(y, c) True
>>> np.array_equal(z, d) True
>>> flatten([3.14, [np.array(2.71), np.array([1.61])]]) ... (array([ 3.14, 2.71, 1.61]), [(), [(), (1,)]])
-
revrand.utils.base.
issequence
(obj)¶ Test if an object is an iterable generator, list or tuple.
Parameters: obj (object) – object to test Returns: True if obj
is a tuple, list or generator only.Return type: bool Examples
>>> issequence([1, 2]) True >>> issequence((1,)) True >>> issequence((i for i in range(8))) True >>> issequence(np.array([1, 2, 3])) False
-
revrand.utils.base.
map_indices
(fn, iterable, indices)¶ Map a function across indices of an iterable.
Notes
Roughly equivalent to, though more efficient than:
lambda fn, iterable, *indices: (fn(arg) if i in indices else arg for i, arg in enumerate(iterable))
Examples
>>> a = [4, 6, 7, 1, 6, 8, 2]
>>> from operator import mul >>> list(map_indices(partial(mul, 3), a, [0, 3, 5])) [12, 6, 7, 3, 6, 24, 2]
>>> b = [9., np.array([5., 6., 2.]), ... np.array([[5., 6., 2.], [2., 3., 9.]])]
>>> list(map_indices(np.log, b, [0, 2])) [2.1972245773362196, array([ 5., 6., 2.]), array([[ 1.60943791, 1.79175947, 0.69314718], [ 0.69314718, 1.09861229, 2.19722458]])]
Todo
Floating point precision
>>> list(map_indices(np.exp, list(map_indices(np.log, b, [0, 2])), [0, 2])) ... [9., array([5., 6., 2.]), array([[ 5., 6., 2.], [ 2., 3., 9.]])]
-
revrand.utils.base.
map_recursive
(fn, iterable, output_type=None)¶ Apply a function of a potentially nested list of lists.
Parameters: - fn (callable) – The function to apply to each element (and sub elements) in iterable
- iterable (iterable) – An iterable, sequence, sequence of sequences etc.
fn
will be applied to each element in each list. - output_type (callable, optional) – if None, a map with sub-maps in the same structure as
iterable
will be returned, otherwise the callable will be applied to each sequence (i.e.list
will return lists of lists etc).
Returns: if
output_type
is None, a map with sub-maps in the same structure asiterable
will be returned, otherwise the callable will be applied to each sequence (i.e.list
will return lists of lists etc).Return type: map or iterable type
Examples
>>> seq = [1, 2, [3, 4, [5, 6]], 7] >>> map_recursive(lambda x: x > 4, seq, output_type=list) [False, False, [False, False, [True, True]], True]
>>> map_recursive(lambda x: 2 * x, seq, output_type=tuple) (2, 4, (6, 8, (10, 12)), 14)
-
revrand.utils.base.
nwise
(iterable, n)¶ Sliding window iterator.
Iterator that acts like a sliding window of size n; slides over some iterable n items at a time. If iterable has m elements, this function will return an iterator over m-n+1 tuples.
Parameters: - iterable (iterable) – An iterable object.
- n (int) – Window size.
Returns: Iterator of size n tuples
Return type: iterator of tuples.
Notes
First n iterators are created:
iters = tee(iterable, n)
Next, iterator i is advanced i times:
for i, it in enumerate(iters): for _ in range(i): next(it, None)
Finally, the iterators are zipped back up again:
return zip(*iters)
Examples
>>> a = [2, 5, 7, 4, 2, 8, 6]
>>> list(nwise(a, n=3)) [(2, 5, 7), (5, 7, 4), (7, 4, 2), (4, 2, 8), (2, 8, 6)]
>>> pairwise = partial(nwise, n=2) >>> list(pairwise(a)) [(2, 5), (5, 7), (7, 4), (4, 2), (2, 8), (8, 6)]
>>> list(nwise(a, n=1)) [(2,), (5,), (7,), (4,), (2,), (8,), (6,)]
>>> list(nwise(a, n=7)) [(2, 5, 7, 4, 2, 8, 6)]
Todo
These should probably raise ValueError...
>>> list(nwise(a, 8)) []
>>> list(nwise(a, 9)) []
A sliding window of size n over a list of m elements gives m-n+1 windows
>>> len(a) - len(list(nwise(a, 2))) == 1 True
>>> len(a) - len(list(nwise(a, 3))) == 2 True
>>> len(a) - len(list(nwise(a, 7))) == 6 True
-
revrand.utils.base.
scalar_reshape
(a, newshape, order='C')¶ Reshape, but also return scalars or empty lists.
Identical to numpy.reshape except in the case where newshape is the empty tuple, in which case we return a scalar instead of a 0-dimensional array.
Examples
>>> a = np.arange(6) >>> np.array_equal(np.reshape(a, (3, 2)), scalar_reshape(a, (3, 2))) True
>>> scalar_reshape(np.array([3.14]), newshape=()) 3.14
>>> scalar_reshape(np.array([2.71]), newshape=(1,)) array([ 2.71])
>>> scalar_reshape(np.array([]), newshape=(0,)) []
-
revrand.utils.base.
sumprod
(seq)¶ Product of tuple, or sum of products of lists of tuples.
Parameters: seq (tuple or list) – Returns: the product of input tuples, or the sum of products of lists of tuples, recursively. Return type: int Examples
>>> tup = (1, 2, 3) >>> sumprod(tup) 6
>>> lis = [(1, 2, 3), (2, 2)] >>> sumprod(lis) 10
>>> lis = [(1, 2, 3), [(2, 1), (3,)]] >>> sumprod(lis) 11
-
revrand.utils.base.
unflatten
(ary, shapes, reshape=<function scalar_reshape>)¶ Inverse opertation of flatten.
Given a flat (1d) array, and a list of shapes (represented as tuples), return a list of ndarrays with the specified shapes.
Parameters: - ary (a 1d array) – A flat (1d) array.
- shapes (list of tuples) – A list of ndarray shapes (tuple of array dimensions)
Returns: A list of ndarrays with the specified shapes.
Return type: list of ndarrays
See also
revrand.utils.flatten()
- its inverse
Notes
Equivalent to:
lambda ary, shapes, order='C': \ map(partial(custom_reshape, order=order), np.hsplit(ary, np.cumsum(map(partial(np.prod, dtype=int), shapes))), shapes)
Examples
>>> a = np.array([7, 4, 5, 8, 9, 1, 4, 2, 5, 3, 4, 3])
>>> list(unflatten(a, [(1,), (1,), (4,), (2, 3)])) ... [array([7]), array([4]), array([5, 8, 9, 1]), array([[4, 2, 5], [3, 4, 3]])]
>>> list(unflatten(a, [(), (1,), (4,), (2, 3)])) ... [7, array([4]), array([5, 8, 9, 1]), array([[4, 2, 5], [3, 4, 3]])]
>>> list(unflatten(a, [(), (1,), (3,), (2, 3)])) ... [7, array([4]), array([5, 8, 9]), array([[1, 4, 2], [5, 3, 4]])]
>>> list(unflatten(a, [(), (1,), (5,), (2, 3)])) ... Traceback (most recent call last): ... ValueError: total size of new array must be unchanged
>>> flatten(list(unflatten(a, [(), (1,), (4,), (2, 3)]))) ... (array([7, 4, 5, 8, 9, 1, 4, 2, 5, 3, 4, 3]), [(), (1,), (4,), (2, 3)])
>>> list(unflatten(a, [[(1,), (1,)], (4,), (2, 3)])) ... [[array([7]), array([4])], array([5, 8, 9, 1]), array([[4, 2, 5], [3, 4, 3]])]
>>> flatten(list(unflatten(a, [(), (1,), [(4,), (2, 3)]]))) ... (array([7, 4, 5, 8, 9, 1, 4, 2, 5, 3, 4, 3]), [(), (1,), [(4,), (2, 3)]])
Dataset Utilities¶
Dataset loading utilities
Portions of this module derived from http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
-
revrand.utils.datasets.
fetch_gpml_sarcos_data
(transpose_data=True, data_home=None)¶ Fetch the SARCOS dataset from the internet and parse appropriately into python arrays
>>> gpml_sarcos = fetch_gpml_sarcos_data()
>>> gpml_sarcos.train.data.shape (44484, 21)
>>> gpml_sarcos.train.targets.shape (44484,)
>>> gpml_sarcos.train.targets.round(2) array([ 50.29, 44.1 , 37.35, ..., 22.7 , 17.13, 6.52])
>>> gpml_sarcos.test.data.shape (4449, 21)
>>> gpml_sarcos.test.targets.shape (4449,)
-
revrand.utils.datasets.
fetch_gpml_usps_resampled_data
(transpose_data=True, data_home=None)¶ Fetch the USPS handwritten digits dataset from the internet and parse appropriately into python arrays
>>> usps_resampled = fetch_gpml_usps_resampled_data()
>>> usps_resampled.train.targets.shape (4649,)
>>> usps_resampled.train.targets array([6, 0, 1, ..., 9, 2, 7])
>>> usps_resampled.train.data.shape (4649, 256)
>>> np.all(-1 <= usps_resampled.train.data) True
>>> np.all(usps_resampled.train.data < 1) True
>>> usps_resampled.test.targets.shape (4649,)
>>> usps_resampled.test.data.shape (4649, 256)
>>> usps_resampled = fetch_gpml_usps_resampled_data(transpose_data=False) >>> usps_resampled.train.data.shape (256, 4649)
-
revrand.utils.datasets.
gen_gausprocess_se
(ntrain, ntest, noise=1.0, lenscale=1.0, scale=1.0, xmin=-10, xmax=10)¶ Generate a random (noisy) draw from a Gaussian Process with a RBF kernel.
-
revrand.utils.datasets.
get_data_home
(data_home=None)¶ Return the path of the revrand data dir.
This folder is used by some large dataset loaders to avoid downloading the data several times.
By default the data dir is set to a folder named ‘revrand_data’ in the user home folder.
Alternatively, it can be set by the ‘REVRAND_DATA’ environment variable or programmatically by giving an explicit folder path. The ‘~’ symbol is expanded to the user home folder.
If the folder does not already exist, it is automatically created.
-
revrand.utils.datasets.
make_polynomial
(degree=3, n_samples=100, bias=0.0, noise=0.0, return_coefs=False, random_state=None)¶ Generate a noisy polynomial for a regression problem
Examples
>>> X, y, coefs = make_polynomial(degree=3, n_samples=200, noise=.5, ... return_coefs=True, random_state=1)
-
revrand.utils.datasets.
make_regression
(func, n_samples=100, n_features=1, bias=0.0, noise=0.0, random_state=None)¶ Make dataset for a regression problem.
Examples
>>> f = lambda x: 0.5*x + np.sin(2*x) >>> X, y = make_regression(f, bias=.5, noise=1., random_state=1) >>> X.shape (100, 1) >>> y.shape (100,) >>> X[:5].round(2) array([[ 1.62], [-0.61], [-0.53], [-1.07], [ 0.87]]) >>> y[:5].round(2) array([ 0.76, 0.48, -0.23, -0.28, 0.83])
Decorators Utilities¶
Reusable decorators
-
class
revrand.utils.decorators.
Memoize
(func)¶ Examples
>>> @Memoize ... def fib(n): ... if n < 2: ... return n ... return fib(n-2) + fib(n-1)
>>> fib(10) 55
>>> isinstance(fib, dict) True
>>> fib == { ... (0,): 0, ... (1,): 1, ... (2,): 1, ... (3,): 2, ... (4,): 3, ... (5,): 5, ... (6,): 8, ... (7,): 13, ... (8,): 21, ... (9,): 34, ... (10,): 55, ... } True
Order is not necessarily maintained.
>>> sorted(fib.keys()) [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,)]
>>> sorted(fib.values()) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
-
class
revrand.utils.decorators.
OrderedMemoize
(func)¶ Examples
>>> @OrderedMemoize ... def fib(n): ... if n < 2: ... return n ... return fib(n-2) + fib(n-1)
>>> fib(10) 55
The arguments and values are cached in the order they were called.
>>> list(fib.keys()) [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,)]
>>> list(fib.values()) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
>>> fib OrderedMemoize([((0,), 0), ((1,), 1), ((2,), 1), ((3,), 2), ((4,), 3), ((5,), 5), ((6,), 8), ((7,), 13), ((8,), 21), ((9,), 34), ((10,), 55)])
-
revrand.utils.decorators.
flatten_args
(fn)¶ Examples
>>> @flatten_args ... def f(x): ... return 2*x
>>> x, y, z = f(np.array([1., 2.]), 3., np.array([[1., 2.],[.5, .9]]))
>>> x array([ 2., 4.])
>>> y 6.0
>>> z array([[ 2. , 4. ], [ 1. , 1.8]])
-
revrand.utils.decorators.
unvectorize_args
(fn)¶ -
Examples
The Rosenbrock function is commonly used as a performance test problem for optimization algorithms. It and its derivatives are included in scipy.optimize and is implemented as expected by the family of optimization methods in scipy.optimize.
- def rosen(x):
- return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)
This representation makes it unwieldy to perform operations such as plotting since it is less straightforward to evaluate the function on a meshgrid. This decorator helps reconcile the differences between these representations.
>>> from scipy.optimize import rosen
>>> rosen(np.array([0.5, 1.5])) 156.5
>>> unvectorize_args(rosen)(0.5, 1.5) ... 156.5
The rosen function is implemented in such a way that it generalizes to the Rosenbrock function of any number of variables. This decorator supports can support any functions defined in a similar manner.
The function with any number of arguments are well-defined:
>>> rosen(np.array([0.5, 1.5, 1., 0., 0.2])) 418.0
>>> unvectorize_args(rosen)(0.5, 1.5, 1., 0., 0.2) ... # can accept any variable number of arguments! 418.0
Make it easier to work with for other operations
>>> rosen_ = unvectorize_args(rosen) >>> y, x = np.mgrid[0:2.1:0.05, -1:1.2:0.05] >>> z = rosen_(x, y) >>> z.round(2) array([[ 104. , 85.25, 69.22, ..., 121.55, 146.42, 174.92], [ 94.25, 76.48, 61.37, ..., 110.78, 134.57, 161.95], [ 85. , 68.2 , 54.02, ..., 100.5 , 123.22, 149.47], ..., [ 94.25, 113.53, 133.57, ..., 71.83, 54.77, 39.4 ], [ 104. , 124.25, 145.22, ..., 80.55, 62.42, 45.92], [ 114.25, 135.48, 157.37, ..., 89.78, 70.57, 52.95]])
Now this can be directly plotted with mpl_toolkits.mplot3d.Axes3D and ax.plot_surface.
-
revrand.utils.decorators.
vectorize_args
(fn)¶ When defining functions of several variables, it is usually more readable to write out each variable as a separate argument. This is also convenient for evaluating functions on a numpy.meshgrid.
However, the family of optimizers in scipy.optimize expects that all functions, including those of several variables, receive a single argument, which is a numpy.ndarray in the case of functions of several variables.
Readability counts. We need not compromise readability to conform to some interface when higher-order functions/decorators can abstract away the details for us. This is what this decorator does.
Examples
Optimizers such as those in scipy.optimize expects a function defined like this.
>>> def fun1(v): ... # elliptic parabaloid ... return 2*v[0]**2 + 2*v[1]**2 - 4
>>> a = np.array([2, 3])
>>> fun1(a) 22
Whereas this representation is not only more readable but more natural.
>>> def fun2(x, y): ... # elliptic parabaloid ... return 2*x**2 + 2*y**2 - 4
>>> fun2(2, 3) 22
It is also important for evaluating functions on a numpy.meshgrid
>>> y, x = np.mgrid[-5:5:0.2, -5:5:0.2] >>> fun2(x, y) array([[ 96. , 92.08, 88.32, ..., 84.72, 88.32, 92.08], [ 92.08, 88.16, 84.4 , ..., 80.8 , 84.4 , 88.16], [ 88.32, 84.4 , 80.64, ..., 77.04, 80.64, 84.4 ], ..., [ 84.72, 80.8 , 77.04, ..., 73.44, 77.04, 80.8 ], [ 88.32, 84.4 , 80.64, ..., 77.04, 80.64, 84.4 ], [ 92.08, 88.16, 84.4 , ..., 80.8 , 84.4 , 88.16]])
We can easily reconcile the differences between these representation without having to compromise readability.
>>> fun1(a) == vectorize_args(fun2)(a) True
>>> @vectorize_args ... def fun3(x, y): ... # elliptic parabaloid ... return 2*x**2 + 2*y**2 - 4
>>> fun1(a) == fun3(a) True
Random Generators¶
-
revrand.utils.rand.
endless_permutations
(N, random_state=None)¶ Generate an endless sequence of random integers from permutations of the set [0, ..., N).
If we call this N times, we will sweep through the entire set without replacement, on the (N+1)th call a new permutation will be created, etc.
Parameters: - N (int) – the length of the set
- random_state (int or RandomState, optional) – random seed
Yields: int – a random int from the set [0, ..., N)