An association of a key to 1 or many values.
store the output path of a Sink as a checkpoint
Base trait for "computation nodes" with no generic type information for easier rewriting
store the compression parameters for sinks
A list that is distributed across multiple machines.
A list that is distributed across multiple machines.
It supports a few Traversable-like methods:
- parallelDo: a 'map' operation transforming elements of the list in parallel - ++: to concatenate 2 DLists - groupByKey: to group a list of (key, value) elements by key, so as to get (key, values) - combine: a parallel 'reduce' operation - materialise: transforms a distributed list into a non-distributed list
A wrapper around an object that is part of the graph of a distributed computation
An output store from a MapReduce job
specify an object on which it is possible to add sinks and to compress them
DataSource for a computation graph.
DataSource for a computation graph.
It reads key-values (K, V) from the file system and uses an input converter to create a type A of input
Interface for specifying parallel operation over DLists in the absence of an environment
Interface for writing outputs from a DoFn
Interface for specifying parallel operation over DLists.
Interface for specifying parallel operation over DLists. The semantics of DoFn lifecycle are as follows:
For a given chunk of DList elements: 1. 'setup' will be called; 2. 'process' will be called for each element in the chunk; 3. 'cleanup' will be called.
These 3 steps encapsulate the entire life-cycle of a DoFn. A DoFn object will not be referenced after these steps
An object holder which can hold a distributed value
Define the expiry policy for checkpoint files
Define the expiry policy for checkpoint files
You can define
A distributed list of associations.
A distributed list of associations.
Specify the way in which key-values are "shuffled".
Specify the way in which key-values are "shuffled". Used by groupByKey
in DList
Implicit definitions of Grouping instances for common types.
Convert an InputFormat's key-value types to the type produced by a source
fusion of both trait when bi-directional conversion is possible
A non-empty iterable contains at least one element.
A non-empty iterable contains at least one element. Consequences include:
- reduceLeft
will always produce a value.
- head
will always produce a value.
- tail
will always produce a value.
Some operations on a non-empty iterable result in a non-empty iterable.
Construction of an Iterable1
is typically performed with the +::
method, defined on Iterable1.RichIterator
.
For example:
import Iterable1._ // A regular iterator. val x: Iterable[Int] = ... // Constructs a non-empty iterable with 74 at the head. val y: Iterable1[Int] = 74 +:: x
A non-empty iterator contains at least one element.
A non-empty iterator contains at least one element. Consequences include:
- reduceLeft
will always produce a value.
- first
will always produce a value.
- next
will always produce a value on its first invocation.
- hasNext
will always return true on its first invocation.
- scanLeft1
will always produce a value.
Some operations on a non-empty iterator result in a non-empty iterator.
Construction of an Iterator1
is typically performed with the +::
method, defined on Iterator1.RichIterator
.
For example:
import Iterator1._ // A regular iterator. val x: Iterator[Int] = ... // Constructs a non-empty iterator with 74 at the first. val y: Iterator1[Int] = 77 +:: x
NOTE: Most Iterator functions perform SIDE-EFFECTS and so EQUATIONAL REASONING DOES NOT APPLY.
Convert the type consumed by a DataSink into an OutputFormat's key-value types.
This class wraps the Hadoop (mutable) configuration with additional configuration information such as the jars which should be added to the classpath.
This is a Sink which can also be used as a Source
Trait that is sub-classed by objects to provide sets of unique identifiers.
Typeclass for sending types across the Hadoop wire
Typeclass for sending types across the Hadoop wire
Implicit definitions of WireFormat instances for common types.
Definition of the Equal instance for CompNodes
Base trait for "computation nodes" with no generic type information for easier rewriting
Each computation node has a unique id and equality of 2 nodes is based on this id.
CompNodes are Attributable so that they can be used in attribute grammars