Performance Tuning¶

Many features StreamMine3G offers can be tuned via a set parameters. Using the right parameter values, it is possible to process up to 350k Events/s on a single node, hence several millions of events if you enlarge your StreamMine3G setup. However, if parameters are chosen wrongly, user may experience very low throughput and high latencies.

Network/Event Batching¶

StreamMine3G supports event batching, hence instead of doing a write and a read on the network interface for every single event, StreamMine3G enqueue the events to an internal FIFO queue until a certain threshold is reached and the events are written in one go (via a writev() call) on the wire.

Batch Size¶

Event batching increases throughput as the amount of system call is lowered, however, for operators that only sporadically emit events, event batching may increase latency, hence the network batch size threshold can be choose on operator level. By default, the threshold is set to 1 Byte, hence every single event is immediately outputted, i.e., no event batching is used.

OPTIONKEY: SENDINGBATCHSIZE / sendingBatchSize
DEFAULT is: 1 (no batching enabled) / allowed values: 0-10240000 (10 MB)

Batch Delegation¶

StreamMine3G writes data on the wire in a synchronous way, hence, whenever a batch is full, the thread that enqueued an event last to that batch will be blocked for a short time. In addition to the thread currently writing the data to the wire, all other threads willing to append more events to that batch are usually blocked during that period lowering the overall throughput. To increase performance, StreamMine3G supports batch delegation: Hence, whenever a thread tries to append an event to a batch that is currently being written to the wire, the event will be appended to a new internal list and the thread currently in charge of writing the data to wire informed about that fact. This results in a non-blocking high throughput, low latency event processing.

OPTIONKEY: BUFFERBATCHTYPE / bufferBatchType
DEFAULT is: 0 (no delegation enable) / allowed values: 0 and 1 (1 = delegation)

Furthermore the queue and processing size can be adjusted if delegation is used. The queue size defines how many of those internal queues are being created while the writer thread is blocked while the second parameter defines after how many writes a handover to a new thread is being performed. The first parameter is important for automatic flow control, i.e., back pressure if the writing to the network interface is slower than the event processing while the second parameter ensures a fair workload for all threads.

OPTIONKEY: BUFFERBATCHDELEGATEDQSIZE / bufferBatchDelegatedQueueSize
DEFAULT is: 10
OPTIONKEY: BUFFERBATCHDELEGATEDPSIZE / bufferBatchDelegatedProcessingSize
DEFAULT is: 10

Multi Threading¶

Each StreamMine3G has a thread pool. Ideally the thread pool has as many threads as cores are available on the (virtual machine) machine StreamMine3G is running on. You can set the thread pool size, i.e., how many threads should be available for that StreamMine3G node using the the threadPool parameter in the configuration of the node in zookeeper: E.g. /streammine3g/nodes/nodeXYZ.mycloud.com , in case your node has the nodeName nodeXYZ.mycloud.com.

Event Processing¶

The thread pool is shared among all slices which are deployed on that StreamMine3G node. However, it is possible to limit the amount of threads being used to process events concurrently by an operator using the PROCESSINGTASKSPROCESSOR parameter.

Let's assume that our machine has 8 cores and we set the thread pool size to 8 threads. However, we limit our operator to use just 4 of those 8 available threads, hence, only 4 events will be taken out of the incoming event queue and processed concurrently.

OPTIONKEY: PROCESSINGTASKSPROCESSOR / processingTasksProcessor
DEFAULT is: 8

Event Generation¶

In an analogue way, the amount of threads to generate events using the generate method can be limited using the PROCESSINGTASKSGENERATOR parameter.

Hence, the generate method will be called concurrently by only x different threads.