com.nicta.scoobi.impl.plan

mscr

package mscr

Visibility
  1. Public
  2. All

Type Members

  1. case class BypassOutputChannel(input: ParallelDo, graph: Graph = Graph(Root(Seq()))) extends MscrOutputChannel with Product with Serializable

    This output channel simply copy values coming from a ParallelDo input (a mapper in an Input channel) to this node sinks and bridgeStore

  2. trait Channel extends AnyRef

    Abstract trait for both input and output channels

  3. class FloatingInputChannel extends MscrInputChannel

    This input channel is a tree of Mappers which are not connected to Gbk nodes

  4. class GbkInputChannel extends MscrInputChannel

    This input channel is a tree of Mappers which are all connected to Gbk nodes

  5. case class GbkOutputChannel(groupByKey: GroupByKey, combiner: Option[Combine] = None, reducer: Option[ParallelDo] = None, graph: Graph = Graph(Root(Seq()))) extends MscrOutputChannel with Product with Serializable

    Output channel for a GroupByKey.

    Output channel for a GroupByKey.

    It can optionally have a reducer and / or a combiner applied to the grouped key/values.

    The possible combinations are

    • gbk
    • gbk -> combiner
    • gbk -> reducer
    • gbk -> combiner -> reducer

    There can not be gbk -> reducer -> combiner because in that case the second combiner is transformed as a parallelDo by the Optimiser

  6. case class Graph(start: CompNode) extends Layering with Product with Serializable

  7. trait InputChannel extends Channel

    An input channel groups mapping operations from a single DataSource, attached to a source node (a Load node, or a GroupByKey node from a previous Mscr for example).

    An input channel groups mapping operations from a single DataSource, attached to a source node (a Load node, or a GroupByKey node from a previous Mscr for example).

    There are however more data inputs for an InputChannel since the environments of ParallelDos are inputs as well

    An InputChannel emits (key, values) of different types classified by an Integer tag, either:

    • the GroupByKey node id that will consume the key/values
    • the ParallelDo node id that will consume the key/values

    The main functionality of an InputChannel is to map an input key/value to another key/value to be grouped or reduced using the functions of ParallelDos.

    There are 2 main types of InputChannels:

    • GbkInputChannel: this input channel outputs key/values to GroupByKeys (or to some other parallelDo nodes if the result needs to be reused, see #282)
    • FloatingInputChannel: this input channel simply does some mapping for "floating" paralleldos

    They both share some implementation in the MscrInputChannel trait.

    An input channel can have no mappers at all. In that case the values from the source node are directly emitted with no transformation.

    Two InputChannels are equal if they have the same id.

  8. case class InputChannels(channels: Seq[InputChannel]) extends Product with Serializable

  9. case class KeyTypes(types: Map[Int, (WireReaderWriter, KeyGrouping)] = Map()) extends Product with Serializable

    encapsulation of expected key types for each tag

  10. trait Layering extends ShowNode

    Simple layering algorithm using the Longest path method to assign nodes to layers.

    Simple layering algorithm using the Longest path method to assign nodes to layers.

    See here for a good overview: http://www.cs.brown.edu/~rt/gdhandbook/chapters/hierarchical.pdf

    In our case the layers have minimum height and possibly big width which is actually good if we run things in parallel

  11. case class Mscr extends Attributable with Product with Serializable

    This class represents an MSCR job with a Seq of input channels and a Seq of output channels

    This class represents an MSCR job with a Seq of input channels and a Seq of output channels

    Each mscr has a unique id, which is used for equality.

    A Mscr has both input and output channels where input channels are doing mapping operations while output channels are doing grouping and reducing operations.

    In order to optimise the processing a same value can be computed by several mapping functions and be directed to different grouping / reducing functions, each of them identified by a tag.

  12. trait MscrInputChannel extends InputChannel

    Common implementation of InputChannel for GbkInputChannel and FloatingInputChannel

  13. trait MscrOutputChannel extends OutputChannel

    Implementation of an OutputChannel for a Mscr

  14. trait MscrsDefinition extends Layering with Optimiser

    This trait processes the computation graph created out of DLists and creates map-reduce jobs from it.

    This trait processes the computation graph created out of DLists and creates map-reduce jobs from it.

    The algorithm consists in:

    - building layers of independent nodes in the graph - finding the input nodes for the first layer - reaching "output" nodes from the input nodes - building output channels with those nodes - building input channels connecting the output to the input nodes - aggregating input and output channels as Mscr representing a full map reduce job - iterating on any processing node that is not part of a Mscr

  15. trait OutputChannel extends Channel

    An OutputChannel is responsible for emitting key/values grouped by one Gbk or passed through from an InputChannel with no grouping

    An OutputChannel is responsible for emitting key/values grouped by one Gbk or passed through from an InputChannel with no grouping

    Two OutputChannels are equal if they have the same tag. This tag is the id of the last processing node of the channel

  16. case class OutputChannels(channels: Seq[OutputChannel]) extends Product with Serializable

  17. case class ValueTypes(types: Map[Int, (WireReaderWriter)] = Map()) extends Product with Serializable

    encapsulation of expected value types for each tag

Value Members

  1. object Channel

  2. object InputChannel

  3. object Mscr extends Serializable

    Utility functions to create Mscrs

  4. object OutputChannel

    Utility functions for Output channels

Ungrouped