Skip to content

Latest commit

 

History

History
51 lines (30 loc) · 3.35 KB

spark-streaming-receiverinputdstreams.adoc

File metadata and controls

51 lines (30 loc) · 3.35 KB

ReceiverInputDStreams - Input Streams with Receivers

Receiver Input Streams (ReceiverInputDStreams) are specialized input streams that use receivers to receive data (and hence the name which stands for an InputDStream with a receiver).

Note
Receiver input streams run receivers as long-running tasks that occupy a core per stream.

ReceiverInputDStream abstract class defines the following abstract method that custom implementations use to create receivers:

def getReceiver(): Receiver[T]

The receiver is then sent to and run on workers (when ReceiverTracker is started).

Note

A fine example of a very minimalistic yet still useful implementation of ReceiverInputDStream class is the pluggable input stream org.apache.spark.streaming.dstream.PluggableInputDStream (the sources on GitHub). It requires a Receiver to be given (by a developer) and simply returns it in getReceiver.

PluggableInputDStream is used by StreamingContext.receiverStream() method.

ReceiverInputDStream uses ReceiverRateController when spark.streaming.backpressure.enabled is enabled.

Note

Both, start() and stop methods are implemented in ReceiverInputDStream, but do nothing. ReceiverInputDStream management is left to ReceiverTracker.

Read ReceiverTrackerEndpoint.startReceiver for more details.

The source code of ReceiverInputDStream is here at GitHub.

Generate RDDs for Batch Interval — compute Method

The abstract compute(validTime: Time): Option[RDD[T]] method (from DStream) uses start time of DStreamGraph, i.e. the start time of StreamingContext, to check whether validTime input parameter is really valid.

If the time to generate RDDs (validTime) is earlier than the start time of StreamingContext, an empty BlockRDD is generated.

Otherwise, ReceiverTracker is requested for all the blocks that have been allocated to this stream for this batch (using ReceiverTracker.getBlocksOfBatch).

The number of records received for the batch for the input stream (as StreamInputInfo) is registered with InputInfoTracker.

If all BlockIds have WriteAheadLogRecordHandle, a WriteAheadLogBackedBlockRDD is generated. Otherwise, a BlockRDD is.

Back Pressure

Caution
FIXME

Back pressure for input dstreams with receivers can be configured using spark.streaming.backpressure.enabled setting.

Note
Back pressure is disabled by default.