com.nicta.scoobi.io.text

DownloadSink

Related Doc: package text

class DownloadSink extends DataSink[NullWritable, NullWritable, NullWritable]

This is a dummy sink just used to collect files downloaded in map tasks

The map task must be a parallelDo like this:

def download = (path: String, InputOutputContext) => { // get the output directory for the current map task val outputDir = FileOutputFormat.getWorkOutputPath(context.context) val outDir = outputDir.toString.replace("file:", "") logger.debug("output dir is "+outDir)

// download the file // ... }

val sink = new DownloadSink("target/test", (_:String).startsWith("source"))

val fileNames: DList[String] = ??? fileNames.parallelDo(download).addSink(sink).persist

The downloaded files will be collected from the working directory of the map task and go to "target/test" based on their path

Linear Supertypes
DataSink[NullWritable, NullWritable, NullWritable], Sink, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DownloadSink
  2. DataSink
  3. Sink
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DownloadSink(target: String, isDownloadedFile: (Path) ⇒ Boolean, overwrite: Boolean = false, check: OutputCheck = Sink.defaultOutputCheck)

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  5. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def compress: Sink

    returns

    a new sink with Gzip compression enabled

    Definition Classes
    Sink
  7. def compressWith(codec: CompressionCodec, compressionType: CompressionType): TextFileSink[NullWritable]

    returns

    a new sink with compression enabled

    Definition Classes
    DownloadSink → Sink
  8. def compression: Option[Compression]

    returns

    a compression object if this sink is compressed

    Definition Classes
    DownloadSinkDataSink
  9. def configureCompression(configuration: Configuration): DataSink[NullWritable, NullWritable, NullWritable]

    configure the compression for a given job

    configure the compression for a given job

    Definition Classes
    DataSink → Sink
  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  15. val id: Int

    unique id for this Sink

    unique id for this Sink

    Definition Classes
    DataSink → Sink
  16. def isCompressed: Boolean

    returns

    true if this Sink is compressed

    Definition Classes
    DataSink → Sink
  17. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  18. def isSinkResult(tag: Int): (Path) ⇒ Boolean

    returns

    true if the file path has the name of an output channel with the proper tag and index or if it is a _SUCCESS file

    Definition Classes
    DownloadSink → Sink
  19. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  20. final def notify(): Unit

    Definition Classes
    AnyRef
  21. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  22. def outputCheck(implicit sc: ScoobiConfiguration): Unit

    Check the validity of the DataSink specification.

    Check the validity of the DataSink specification.

    Definition Classes
    DownloadSinkDataSink → Sink
  23. def outputConfigure(job: Job)(implicit sc: ScoobiConfiguration): Unit

    Configure the DataSink.

    Configure the DataSink.

    Definition Classes
    DownloadSinkDataSink → Sink
  24. def outputConverter: OutputConverter[NullWritable, NullWritable, NullWritable]

    Maps the type consumed by this DataSink to the key-values of its OutputFormat.

    Maps the type consumed by this DataSink to the key-values of its OutputFormat.

    Definition Classes
    DownloadSinkDataSink → Sink
  25. def outputFormat(implicit sc: ScoobiConfiguration): Class[_ <: OutputFormat[NullWritable, NullWritable]]

    The OutputFormat specifying the type of output for this DataSink.

    The OutputFormat specifying the type of output for this DataSink.

    Definition Classes
    DownloadSinkDataSink → Sink
  26. def outputKeyClass(implicit sc: ScoobiConfiguration): Class[NullWritable]

    The Class of the OutputFormat's key.

    The Class of the OutputFormat's key.

    Definition Classes
    DownloadSinkDataSink → Sink
  27. def outputPath(implicit sc: ScoobiConfiguration): Option[Path]

    returns

    the path for this Sink.

    Definition Classes
    DownloadSink → Sink
  28. def outputSetup(implicit sc: ScoobiConfiguration): Unit

    This method is called just before writing data to the sink

    This method is called just before writing data to the sink

    Definition Classes
    DataSink → Sink
  29. def outputTeardown(implicit sc: ScoobiConfiguration): Unit

    This method is called just after writing data to the sink

    This method is called just after writing data to the sink

    Definition Classes
    DataSink → Sink
  30. def outputValueClass(implicit sc: ScoobiConfiguration): Class[NullWritable]

    The Class of the OutputFormat's value.

    The Class of the OutputFormat's value.

    Definition Classes
    DownloadSinkDataSink → Sink
  31. lazy val stringId: String

    unique id for this Sink, as a string.

    unique id for this Sink, as a string. Can be used to create a file path

    Definition Classes
    DataSink → Sink
  32. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  33. def toString(): String

    Definition Classes
    AnyRef → Any
  34. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from DataSink[NullWritable, NullWritable, NullWritable]

Inherited from Sink

Inherited from AnyRef

Inherited from Any

Ungrouped