I've mentioned a couple of times that I don't think parallel block scanning will give a noticeable performance improvement, but I've never really explained why. I made some graphs trying to illustrate my point. Here's a bunch of blocks, from $1$ to $n$:
Assume that every block has $m$ transactions for simplicity.
For scanning silent payments, the main computational burden is calculating the candidate scriptpubkey. This is an operation that has to be done for every transaction, meaning that for $n$ blocks of $m$ transactions, we need to perform $n \times m$ operations.
However, since transactions are independent of one another, these operations can be done in parallel. Currently we parallelize this process on the block level, meaning we parallelize the transactions in a single block. This looks like this:
All transactions in the green square can be done in parallel.
Another option is to scan blocks in parallel as well. This way, all transactions can be done in parallel:
However in practice, I think it will look more like this:
This is because the target device is limited in how many actions it can perform in parallel. In other words: there is a maximum size to the green square. If a computer has $k$ cores available, it can process $k$ transactions in parallel and the square is of size $k$.
Here's another illustration of the 2 approaches, where we take the limit for parallelization into account ($k=3$):
To summarize, it does not make a difference whether the parallelized transactions are all part of the same block $[1,1] \dots [1,k]$ or of different blocks $[1,1] \dots [k,1]$. Ultimately we need to process every block, meaning we need to process every transaction in the $n \times m$ grid.
I've mentioned a couple of times that I don't think parallel block scanning will give a noticeable performance improvement, but I've never really explained why. I made some graphs trying to illustrate my point. Here's a bunch of blocks, from$1$ to $n$ :
Assume that every block has$m$ transactions for simplicity.
For scanning silent payments, the main computational burden is calculating the candidate scriptpubkey. This is an operation that has to be done for every transaction, meaning that for$n$ blocks of $m$ transactions, we need to perform $n \times m$ operations.
However, since transactions are independent of one another, these operations can be done in parallel. Currently we parallelize this process on the block level, meaning we parallelize the transactions in a single block. This looks like this:
All transactions in the green square can be done in parallel.
Another option is to scan blocks in parallel as well. This way, all transactions can be done in parallel:
However in practice, I think it will look more like this:
This is because the target device is limited in how many actions it can perform in parallel. In other words: there is a maximum size to the green square. If a computer has$k$ cores available, it can process $k$ transactions in parallel and the square is of size $k$ .
Here's another illustration of the 2 approaches, where we take the limit for parallelization into account ($k=3$ ):
To summarize, it does not make a difference whether the parallelized transactions are all part of the same block$[1,1] \dots [1,k]$ or of different blocks $[1,1] \dots [k,1]$ . Ultimately we need to process every block, meaning we need to process every transaction in the $n \times m$ grid.