Improve performance by reducing redundant dictionary fetches.#336
Improve performance by reducing redundant dictionary fetches.#336nikohansen merged 1 commit intoCMA-ES:developmentfrom
Conversation
| if iteration is not None: | ||
| self[key]['iteration'] = iteration | ||
| entry['geno'] = geno | ||
| entry['iteration'] = iteration |
There was a problem hiding this comment.
This doesn't look correct to me, as it doesn't check for None first?
There was a problem hiding this comment.
Shortly before we already perform this check and define it as last_iteration + 0.5 if necessary, so to my understanding it is impossible that it will be None. Maybe the right way to go would be to place an assert + comment to point this out?
But removing the if statement indeed was just a cleanup because I thought it makes things more clearly if we know that we will always have the iteration variable and doesn't have any significant performance benefits.
if iteration is not None:
if iteration > self.last_iteration:
self.last_solution_index = 0
self.last_iteration = iteration
else:
iteration = self.last_iteration + 0.5 # a - hack to get a somewhat reasonable value
|
|
||
| # TODO: insert takes 30% of the overall CPU time, mostly in def key() | ||
| # with about 15% of the overall CPU time | ||
| def insert(self, key, geno=None, iteration=None, fitness=None, |
There was a problem hiding this comment.
The comment was removed, but why? How do we know insert is not expensive anymore?
There was a problem hiding this comment.
So before the change the key-function was called every time we fetched the element from the map, which happened three times instead of only once. So the expensive part should go down to 1/3 of what it was before. So much the theory.
I think for my use case the two changes that I made reduced the running time of my whole program by around 30 %. Before the change the insert function took a big part of the time of the overall runtime which is not the case anymore, and it doesn't seem to me that improving it further is worth the effort.
I tried to make a specific test, but the numbers in the end seemed to vary with the type of key etc, and I'm not sure how to reproduce the exact numbers of the comment.
Sorry that I can't give a better answer. If there is any test that I should do I'm happy to run it and write the numbers here.
There was a problem hiding this comment.
So before the change the key-function was called every time we fetched the element from the map, which happened three times instead of only once. So the expensive part should go down to 1/3 of what it was before. So much the theory.
You are assuming that fetching is the expensive part, which, I guess, you only did in the first place because you saw the comment you deleted?
Maybe make a comment on the todo instead of removing it, as you seem not to know for sure whether your changes actually addressed it.
There was a problem hiding this comment.
No, not really. I plotted a callgraph which guided me to the key function and from there back to the insert function. I only realized afterwards, that the comment was about this problem. I don't have the original files anymore, but I just ran a small benchmark with both versions again.
And here the modified version:

Three points to consider about the plot:
I don't now how much overhead the profiling itself added.
I pruned everything that is smaller then 2% of overall runtime, so not all Vertices and Edges are shown, but the numbers itself are unaffected.
The original version had a total runtime of 1.477 s the modified version 1.313 s. For correctness we have to multiply all numbers in the graph of the modified run with a factor of 0.89, because all numbers are relative to the total runtime.
In the first version getitem was called three times for each insert, each time resulting in a call to the key-function. With the modification the calls of getItem reduced to zero. There is still a call of the key function through setItem in both versions (even though the edge is not displayed in the original graph because the connection was below 2%).
The time that the insert function took reduced to 1/3 of what it used to be ((3.47 % / 9.23 %) * 0.89).
The total speedup seems to be much smaller than I originally thought (only 10% in this test). I have to confess that I didn't do extensive tests on the whole program, I just ran it again, so maybe there was some background task in the first run or anything else making me believe that the change improved more then it actually did. Still, I would see this comment as addressed.
Does this all make sense or am I overlooking something? I think you can also just edit my pull request if you have a specific comment in mind that you would like to have placed there.
There was a problem hiding this comment.
Cool, I how do you do these graphs?
The time that the insert function took reduced to 1/3 of what it used to be ((3.47 % / 9.23 %) * 0.89).
OK, true, on the other hand, first case: key in utils takes cumulated 12.84%. Second case: 8.47%. This matches with the overall reduction of number of calls to key. Doesn't that mean that reducing the number of calls to key was just about moderately effective overall? I would say so, which means, the part of the comment that says key is expensive remains valid?
Still, I would see this comment as addressed.
I don't know whether it is addressed, in particular because we don't know the setup under which the original comment was made (I suspect with large population size and moderate to large dimension). Also, the comment suggests that (only) half of the time is spent in key which suggests that the modification should speed up the function by less than a factor of two? In such a case, where I don't quite know, I won't see it as addressed until I have better confirmation.
Don't get me wrong, I think the code modification is solid and justified, there nothing wrong with that.
There was a problem hiding this comment.
I used cProfile with gprof2dot.
python -m cProfile -o output.prof benchmark.py
python -m gprof2dot -f pstats --root "benchmark:17:run_benchmark" -n 2 -e 2 output.prof > output.dot
Let me know if you have any questions.
Everything you write sounds correct to me. I just not sure what we could really write instead. I think that 3.5% for the insert function of overall runtime is probably just fine. Leaving the comment as it is might send someone else to search for something that they just can't find. I don't really know what I can do to find out if it is resolved on top of what I did so far as I can't reproduce original the numbers (of course I'm more a user of this library than a developer :D). But the question is a bit will someone else in the future be able to do that? If not than the comment might stay forever.
As said if you have anything for that comment in mind I'm happy to edit it in that way.
The key function is potentially something to work on, but I didn't see any easy fix that doesn't cause bigger structural changes.
There was a problem hiding this comment.
Just leave the original comment and add a comment on it below that the insert function was change to (try to) address this by removing redundant dictionary accesses with a date when this was done (like Jan 2026).
|
Thanks for the pull request! Can you check the above comments? |
|
Could you squash your commits into a single one? We don't need to see the fixes as separate commits! |
2171bc7 to
e95c88b
Compare
| if (iteration % 10) < 1: | ||
| self.truncate(300, iteration - 3) | ||
| elif value is not None: | ||
| iteration_tmp = value.get('iteration') |
There was a problem hiding this comment.
I was in the belief that dictionary access is generally very quick in Python, am I wrong?
There was a problem hiding this comment.
You are correct. From a performance perspective the benefit should be neglectable as long as the hash function is cheap (I think hash values for strings are cached).
I usually see this as a safe way of being sure that the map can't change in the meantime and that we don't spend unnecessary computation time if the hash function is expensive.
But well it doesn't really fit into the scheme of what the commit is about so should I remove this change?
There was a problem hiding this comment.
Yes, thanks, I think it's better to keep the original version mainly for readability and also for consistency.
There was a problem hiding this comment.
Ok, I changed this back to the original version.
| iteration = value['iteration'] | ||
| except: | ||
| pass | ||
| iteration = value.get('iteration', iteration) |
There was a problem hiding this comment.
What is wrong with the original code here?
There was a problem hiding this comment.
Throwing an exception is quiet expensive, and there is no need for it here. Even if this is not a problem right now it can unexpectedly become one if we run into this path more often. Also we don't check which exception was thrown, so it can be anything. If the user pressed ctrl+c or any other interrupt comes in at this line the program would silently ignore it, and in addition start to show undefined / unexpected behavior from this point on because the iteration variable was not updated, even though it should have been.
Finally I think the second version is much more expressive, as it basically says, take 'iteration' from value with the old iteration-variable as default.
There was a problem hiding this comment.
True, the exception should be caught more tightly and not in this a lazy manner. Otherwise I am not sure about your arguments: "Easier to Ask Forgiveness than Permission (EAFP) is the recommended Python practice over checking conditions in advance (LBYL – Look Before You Leap)." Doesn't really matter here.
There was a problem hiding this comment.
Hmm, Look Before you leap would be something like
if 'iteration' in value:
iteration = value['iteration']
and yes, I agree, there are good reasons to avoid this type of code (for example because concurrency can create unexpected behavior). But the get-function is exactly there so that we don't have to surround every fetch of a dictionary with a try-except block. I also agree with preferring EAFP over LBYL, but this isn’t really a choice between those two idioms, and preferring EAFP over LBYL doesn't mean that we should use EAFP for its own sake.
There was a problem hiding this comment.
I think the main reason and use case for the get function is the ability to write val = d.get(k, default), not for writing val = d.get(k, val). The latter code has an execution pass which assigns val = val, which kinda just doesn't smell right. I guess the problem here is that there is no perfect coding solution for this, even though it's such a basic scenario.
4d33563 to
7e88661
Compare
| else: | ||
| if key in self.data_with_same_key: | ||
| self.data_with_same_key[key] += [self.data[key]] | ||
| self.data_with_same_key[key].append(self.data[key]) |
There was a problem hiding this comment.
Isn't the only difference that we do not create a new list? I suspect writing [a] is incredibly cheap? You would need to change a lot of code in this package if you insisted in this to be crucial.
There was a problem hiding this comment.
Ok, changed this one back, too.
7e88661 to
8caaea3
Compare
|
Thanks for the PR and the discussion! |
|
Thank you, too! |

Inserting elements into _CMASolutionDict_functional caused read amplification because the same entry was fetched multiple times from the dictionary. Since the custom fetch function typically requires hashing the key, this could result in significant overhead.