a new sink with Gzip compression enabled
a new sink with compression enabled
a compression object if this sink is compressed
configure the compression for a given job
configure the compression for a given job
unique id for this Sink
unique id for this Sink
true if this Sink is compressed
true if the file path has the name of an output channel with the proper tag and index or if it is a _SUCCESS file
Check the validity of the DataSink specification.
Check the validity of the DataSink specification.
Configure the DataSink.
Configure the DataSink.
Maps the type consumed by this DataSink to the key-values of its OutputFormat.
Maps the type consumed by this DataSink to the key-values of its OutputFormat.
The OutputFormat specifying the type of output for this DataSink.
The OutputFormat specifying the type of output for this DataSink.
The Class of the OutputFormat's key.
The Class of the OutputFormat's key.
the path for this Sink.
This method is called just before writing data to the sink
This method is called just before writing data to the sink
This method is called just after writing data to the sink
This method is called just after writing data to the sink
The Class of the OutputFormat's value.
The Class of the OutputFormat's value.
unique id for this Sink, as a string.
unique id for this Sink, as a string. Can be used to create a file path
This is a dummy sink just used to collect files downloaded in map tasks
The map task must be a parallelDo like this:
def download = (path: String, InputOutputContext) => { // get the output directory for the current map task val outputDir = FileOutputFormat.getWorkOutputPath(context.context) val outDir = outputDir.toString.replace("file:", "") logger.debug("output dir is "+outDir)
// download the file // ... }
val sink = new DownloadSink("target/test", (_:String).startsWith("source"))
val fileNames: DList[String] = ??? fileNames.parallelDo(download).addSink(sink).persist
The downloaded files will be collected from the working directory of the map task and go to "target/test" based on their path