Inspired on the solution draft in this blog [1] I would like to propose the addition of a new parameter when new subscribers are configured for enforcing concurrency.
The concurrency parameter - default None meaning no concurrency - will tell Kafkaesk the maximum number of inflight messages that are processed at the same time.
Concurrency will be aplied between records that do not belong to the same partition, so at any time all inflight messages must belong to different partitions.
Maximum throughtput would be achieved only if there are enough partitons assigned to tha consumer. At sone point the fact of having more consuners might imply having less throughtput, since consumers could not make usage of the concurrency since would not be enough partitons. Limiting the size of the cluster should address the issue.
Rationale
Some consumer types might need to spent some time making IO operations, adding more consumers within the cobsumer group would eventually mitigate the issue but at some cost.
By allowing the consuners making concurrent process of partitions within the same consumer the problem is addressed in a more cost effective way.
Implementation details
- Subsequent calls to getmany must return messages from partitions that do not have messages inflight within the consumer.
- The consumer would put together messages from partitions keeping the original order.
- A limited number of Asyncio tasks would gather messages from previous group using a RounRobin algorithm
- Offset commits would be done per partition.
[1] https://www.confluent.io/blog/kafka-consumer-multi-threaded-messaging/
Inspired on the solution draft in this blog [1] I would like to propose the addition of a new parameter when new subscribers are configured for enforcing concurrency.
The
concurrencyparameter - defaultNonemeaning no concurrency - will tell Kafkaesk the maximum number of inflight messages that are processed at the same time.Concurrency will be aplied between records that do not belong to the same partition, so at any time all inflight messages must belong to different partitions.
Maximum throughtput would be achieved only if there are enough partitons assigned to tha consumer. At sone point the fact of having more consuners might imply having less throughtput, since consumers could not make usage of the concurrency since would not be enough partitons. Limiting the size of the cluster should address the issue.
Rationale
Some consumer types might need to spent some time making IO operations, adding more consumers within the cobsumer group would eventually mitigate the issue but at some cost.
By allowing the consuners making concurrent process of partitions within the same consumer the problem is addressed in a more cost effective way.
Implementation details
[1] https://www.confluent.io/blog/kafka-consumer-multi-threaded-messaging/