com.nicta.scoobi.core

DList

Related Doc: package core

trait DList[A] extends DataSinks with Persistent[Seq[A]]

A list that is distributed across multiple machines.

It supports a few Traversable-like methods:

- parallelDo: a 'map' operation transforming elements of the list in parallel - ++: to concatenate 2 DLists - groupByKey: to group a list of (key, value) elements by key, so as to get (key, values) - combine: a parallel 'reduce' operation - materialise: transforms a distributed list into a non-distributed list

Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DList
  2. Persistent
  3. DataSinks
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Type Members

  1. abstract type C <: CompNode

    Definition Classes
    DListPersistent
  2. type T = DList[A]

    Definition Classes
    DListDataSinks

Abstract Value Members

  1. abstract def ++(ins: DList[A]*): DList[A]

    Concatenate one or more distributed lists to this distributed list.

  2. abstract def addSink(sink: Sink): T

    Definition Classes
    DataSinks
  3. abstract def combine[K, V](f: Reduction[V])(implicit ev: <:<[A, (K, Iterable[V])], wk: WireFormat[K], wv: WireFormat[V]): DList[(K, V)]

    Apply an associative function to reduce the collection of values to a single value in a key-value-collection distributed list

  4. abstract def combineDo[K, V, U](dofn: DoFn[Iterable[V], U])(implicit ev: <:<[A, (K, Iterable[V])], wk: WireFormat[K], wfv: WireFormat[V], wu: WireFormat[U]): DList[(K, U)]

    Low-level combine to be able to emit arbitrary reduced values when the combiner is called

  5. abstract def compressWith(codec: CompressionCodec, compressionType: CompressionType = CompressionType.BLOCK): T

    Definition Classes
    DataSinks
  6. abstract def groupByKey[K, V](implicit ev: <:<[A, (K, V)], wk: WireFormat[K], gpk: Grouping[K], wv: WireFormat[V]): DList[(K, Iterable[V])]

    Group the values of a distributed list with key-value elements by key.

  7. abstract def materialise: DObject[Iterable[A]]

    Turn a distributed list into a normal, non-distributed collection that can be accessed by the client

  8. abstract def parallelDo[B](dofn: DoFn[A, B])(implicit arg0: WireFormat[B]): DList[B]

  9. abstract def parallelDo[B, E](env: DObject[E], dofn: EnvDoFn[A, B, E])(implicit arg0: WireFormat[B], arg1: WireFormat[E]): DList[B]

    Apply a specified function to "chunks" of elements from the distributed list to produce zero or more output elements.

    Apply a specified function to "chunks" of elements from the distributed list to produce zero or more output elements. The resulting output elements from the many "chunks" form a new distributed list

  10. abstract def updateSinks(f: (Seq[Sink]) ⇒ Seq[Sink]): T

    Definition Classes
    DataSinks

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  5. def by[K](kf: (A) ⇒ K)(implicit arg0: WireFormat[K]): DList[(K, A)]

    Create a new distributed list that is keyed based on a specified function.

  6. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def collect[B](pf: PartialFunction[A, B])(implicit arg0: WireFormat[B]): DList[B]

    Build a new DList by applying a partial function to all elements of this DList on which the function is defined

  8. def combineDo[K, V, U](fn: (Iterable[V], InputOutputContext) ⇒ U)(implicit ev: <:<[A, (K, Iterable[V])], wk: WireFormat[K], wfv: WireFormat[V], wu: WireFormat[U], p: ImplicitParameter6): DList[(K, U)]

  9. def combineDo[K, V, U](fn: (Iterable[V], Configuration) ⇒ U)(implicit ev: <:<[A, (K, Iterable[V])], wk: WireFormat[K], wfv: WireFormat[V], wu: WireFormat[U], p: ImplicitParameter5): DList[(K, U)]

  10. def combineDo[K, V, U](fn: (Iterable[V], ScoobiJobContext) ⇒ U)(implicit ev: <:<[A, (K, Iterable[V])], wk: WireFormat[K], wfv: WireFormat[V], wu: WireFormat[U], p: ImplicitParameter4): DList[(K, U)]

  11. def combineDo[K, V, U](fn: (Iterable[V], Heartbeat) ⇒ U)(implicit ev: <:<[A, (K, Iterable[V])], wk: WireFormat[K], wfv: WireFormat[V], wu: WireFormat[U], p: ImplicitParameter3): DList[(K, U)]

  12. def combineDo[K, V, U](fn: (Iterable[V], Counters) ⇒ U)(implicit ev: <:<[A, (K, Iterable[V])], wk: WireFormat[K], wfv: WireFormat[V], wu: WireFormat[U], p: ImplicitParameter2): DList[(K, U)]

  13. def combineDo[K, V, U](fn: (Iterable[V], Emitter[U]) ⇒ Unit)(implicit ev: <:<[A, (K, Iterable[V])], wk: WireFormat[K], wfv: WireFormat[V], wu: WireFormat[U], p: ImplicitParameter1): DList[(K, U)]

  14. def compress: T

    Definition Classes
    DataSinks
  15. def count(p: (A) ⇒ Boolean): DObject[Long]

    Count the number of elements in the list which satisfy a predicate.

  16. def diff(that: DList[A])(implicit cmp: Grouping[A]): DList[A]

    Computes the multiset difference between this DList and that This has the same semantics as Scala's List.diff This makes code makes the assumption that all A's that group together are exactly equivalent and interchangeable with each other, and all A's that do group together will be brought together in the resultant DList.

    Computes the multiset difference between this DList and that This has the same semantics as Scala's List.diff This makes code makes the assumption that all A's that group together are exactly equivalent and interchangeable with each other, and all A's that do group together will be brought together in the resultant DList. If this is a problem, shuffle can be used (after)

  17. def distinct: DList[A]

    Build a new distributed list from this list without any duplicate elements.

  18. def distinctDiff(that: DList[A])(implicit cmp: Grouping[A]): DList[A]

    Computes the set difference between this DList and another This has the same semantics as Scala's Set.diff and is equivalent to calling .distinct.diff(that) but is considerably more efficient, as it can be done in a single mapReduce.

  19. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  20. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  21. def filter(p: (A) ⇒ Boolean): DList[A]

    Keep elements from the distributed list that pass a specified predicate function

  22. def filterNot(p: (A) ⇒ Boolean): DList[A]

    Keep elements from the distributed list that do not pass a specified predicate function

  23. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  24. def flatten[B](implicit ev: <:<[A, Iterable[B]], wtB: WireFormat[B]): DList[B]

    Converts a distributed list of iterable values into to a distributed list in which all the values are concatenated.

  25. def fold(implicit m: Monoid[A]): DObject[A]

    Sum up the elements of this distributed list.

  26. def foldMap[B](f: (A) ⇒ B)(implicit arg0: Monoid[B], arg1: WireFormat[B]): DObject[B]

    Map each element to a scalaz.Monoid and fold.

  27. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  28. def groupBy[K](f: (A) ⇒ K)(implicit arg0: WireFormat[K], arg1: Grouping[K]): DList[(K, Iterable[A])]

    Group the values of a distributed list according to some discriminator function.

  29. def groupByKeyWith[K, V](grouping: Grouping[K])(implicit ev: <:<[A, (K, V)], wfk: WireFormat[K], wfv: WireFormat[V]): DList[(K, Iterable[V])]

    Group the values of a distributed list with key-value elements by key.

    Group the values of a distributed list with key-value elements by key. And explicitly take the grouping that should be used. This is best used when you're doing things like secondary sorts, or groupings with strange logic (like making sure None's / nulls are sprayed across all reducers

  30. def groupWith[K](f: (A) ⇒ K)(gpk: Grouping[K])(implicit arg0: WireFormat[K]): DList[(K, Iterable[A])]

    Group the value of a distributed list according to some discriminator function and some grouping function.

  31. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  32. def head: DObject[A]

    returns

    the head of the DList as a DObject. This is an unsafe operation

  33. def headOption: DObject[Option[A]]

    returns

    the head of the DList as a DObject containing an Option

  34. def isEqual(to: DList[A])(implicit cmp: Grouping[A]): DObject[Boolean]

    Returns if the other DList has the same elements.

    Returns if the other DList has the same elements. A DList is unordered so order isn't considered. The Grouping required isn't very special and almost any will work (including grouping designed for secondary sorting) but for completeness, it is required to send two equal As to the same partition, and sortCompare provide total ordering

  35. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  36. def keys[K, V](implicit ev: <:<[A, (K, V)], mwk: WireFormat[K], mwv: WireFormat[V]): DList[K]

    Create a distributed list containing just the keys of a key-value distributed list.

  37. def length: DObject[Long]

    The length of the distributed list.

  38. def map[B](f: (A) ⇒ B)(implicit arg0: WireFormat[B]): DList[B]

    For each element of the distributed list produce a new element by applying a specified function.

    For each element of the distributed list produce a new element by applying a specified function. The resulting collection of elements form a new distributed list

  39. def mapFlatten[B](f: (A) ⇒ Iterable[B])(implicit arg0: WireFormat[B]): DList[B]

    For each element of the distributed list produce zero or more elements by applying a specified function.

    For each element of the distributed list produce zero or more elements by applying a specified function. The resulting collection of elements form a new distributed list

  40. def mapKeys[K, V, B](f: (K) ⇒ B)(implicit ev: <:<[A, (K, V)], wfv: WireFormat[V], wfb: WireFormat[B]): DList[(B, V)]

    map the keys of a DList[(K, V)]

  41. def mapValues[K, V, B](f: (V) ⇒ B)(implicit ev: <:<[A, (K, V)], wfk: WireFormat[K], wfb: WireFormat[B]): DList[(K, B)]

    map the values of a DList[(K, V)]

  42. def max(implicit cmp: Ordering[A]): DObject[A]

    Find the largest element in the distributed list.

  43. def maxBy[B](f: (A) ⇒ B)(cmp: Ordering[B]): DObject[A]

    Find the largest element in the distributed list.

  44. def min(implicit cmp: Ordering[A]): DObject[A]

    Find the smallest element in the distributed list.

  45. def minBy[B](f: (A) ⇒ B)(cmp: Ordering[B]): DObject[A]

    Find the smallest element in the distributed list.

  46. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  47. final def notify(): Unit

    Definition Classes
    AnyRef
  48. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  49. def parallelDo[B](fn: (A, InputOutputContext) ⇒ B)(implicit wf: WireFormat[B], p: ImplicitParameter4): DList[B]

  50. def parallelDo[B](fn: (A, Configuration) ⇒ B)(implicit wf: WireFormat[B], p: ImplicitParameter3): DList[B]

  51. def parallelDo[B](fn: (A, ScoobiJobContext) ⇒ B)(implicit wf: WireFormat[B], p: ImplicitParameter2): DList[B]

  52. def parallelDo[B](fn: (A, Heartbeat) ⇒ B)(implicit wf: WireFormat[B], p: ImplicitParameter1): DList[B]

  53. def parallelDo[B](fn: (A, Counters) ⇒ B)(implicit wf: WireFormat[B], p: ImplicitParameter): DList[B]

  54. def parallelDo[B](fn: (A, Emitter[B]) ⇒ Unit)(implicit arg0: WireFormat[B]): DList[B]

  55. def partition(p: (A) ⇒ Boolean): (DList[A], DList[A])

    Partitions this distributed list into a pair of distributed lists according to some predicate.

    Partitions this distributed list into a pair of distributed lists according to some predicate. The first distributed list consists of elements that satisfy the predicate and the second of all elements that don't.

  56. def product(implicit num: Numeric[A]): DObject[A]

    Multiply up the elements of this distribute list.

  57. def reduce(op: Reduction[A]): DObject[A]

    Reduce the elements of this distributed list using the specified associative binary operator.

    Reduce the elements of this distributed list using the specified associative binary operator. The order in which the elements are reduced is unspecified and may be non-deterministic

  58. def reduceOption(op: Reduction[A]): DObject[Option[A]]

    Reduce the elements of this distributed list using the specified associative binary operator and a default value if the list is empty.

    Reduce the elements of this distributed list using the specified associative binary operator and a default value if the list is empty. The order in which the elements are reduced is unspecified and may be non-deterministic

  59. def reduceValues[K, V](op: Reduction[V])(implicit ev: <:<[A, (K, Iterable[V])], mwk: WireFormat[K], mwv: WireFormat[V]): DList[(K, V)]

    reduce the values which are the result of a groupByKey

  60. def reduceValuesOption[K, V](op: Reduction[V])(implicit ev: <:<[A, (K, Iterable[V])], mwk: WireFormat[K], mwv: WireFormat[V]): DList[(K, Option[V])]

    reduce the values which are the result of a groupByKey

  61. def shuffle: DList[A]

    Randomly shuffle a DList.

  62. def size: DObject[Long]

    The size of the distributed list.

  63. def stratify(n: Int)(f: (A) ⇒ Int): Seq[DList[A]]

    Divide a DList into multiple partitions.

  64. def stratifyWeighted[N](weights: Seq[N])(implicit arg0: Numeric[N]): Seq[DList[A]]

    Randomly divide a DList into multiple partitions where the stratum proportions are defined by weights.

  65. def sum(implicit num: Numeric[A]): DObject[A]

    Sum up the elements of this distribute list.

  66. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  67. def toString(): String

    Definition Classes
    AnyRef → Any
  68. def values[K, V](implicit ev: <:<[A, (K, V)], mwk: WireFormat[K], mwv: WireFormat[V]): DList[V]

    Create a distributed list containing just the values of a key-value distributed list.

  69. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  70. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  71. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  72. implicit def wf: WireFormat[A]

  73. def withFilter(p: (A) ⇒ Boolean): DList[A]

    the withFilter method

  74. def zipWithIndex: DList[(A, Long)]

    Add an index (Long) to the DList where the index is between 0 and .size-1 of the DList

Inherited from Persistent[Seq[A]]

Inherited from DataSinks

Inherited from AnyRef

Inherited from Any

Ungrouped