Improve performance by reducing redundant dictionary fetches. by PaulStahr · Pull Request #336 · CMA-ES/pycma

PaulStahr · 2026-01-15T20:04:37Z

Inserting elements into _CMASolutionDict_functional caused read amplification because the same entry was fetched multiple times from the dictionary. Since the custom fetch function typically requires hashing the key, this could result in significant overhead.

nikohansen · 2026-01-16T09:05:32Z

-        if iteration is not None:
-            self[key]['iteration'] = iteration
+            entry['geno'] = geno
+        entry['iteration'] = iteration


This doesn't look correct to me, as it doesn't check for None first?

Shortly before we already perform this check and define it as last_iteration + 0.5 if necessary, so to my understanding it is impossible that it will be None. Maybe the right way to go would be to place an assert + comment to point this out?

But removing the if statement indeed was just a cleanup because I thought it makes things more clearly if we know that we will always have the iteration variable and doesn't have any significant performance benefits.

if iteration is not None: if iteration > self.last_iteration: self.last_solution_index = 0 self.last_iteration = iteration else: iteration = self.last_iteration + 0.5 # a - hack to get a somewhat reasonable value

nikohansen · 2026-01-16T09:11:19Z


-    # TODO: insert takes 30% of the overall CPU time, mostly in def key()
-    #       with about 15% of the overall CPU time
    def insert(self, key, geno=None, iteration=None, fitness=None,


The comment was removed, but why? How do we know insert is not expensive anymore?

So before the change the key-function was called every time we fetched the element from the map, which happened three times instead of only once. So the expensive part should go down to 1/3 of what it was before. So much the theory.

I think for my use case the two changes that I made reduced the running time of my whole program by around 30 %. Before the change the insert function took a big part of the time of the overall runtime which is not the case anymore, and it doesn't seem to me that improving it further is worth the effort.

I tried to make a specific test, but the numbers in the end seemed to vary with the type of key etc, and I'm not sure how to reproduce the exact numbers of the comment.

Sorry that I can't give a better answer. If there is any test that I should do I'm happy to run it and write the numbers here.

So before the change the key-function was called every time we fetched the element from the map, which happened three times instead of only once. So the expensive part should go down to 1/3 of what it was before. So much the theory.

You are assuming that fetching is the expensive part, which, I guess, you only did in the first place because you saw the comment you deleted?

Maybe make a comment on the todo instead of removing it, as you seem not to know for sure whether your changes actually addressed it.

No, not really. I plotted a callgraph which guided me to the key function and from there back to the insert function. I only realized afterwards, that the comment was about this problem. I don't have the original files anymore, but I just ran a small benchmark with both versions again.

So here the original version:

And here the modified version:

Three points to consider about the plot:
I don't now how much overhead the profiling itself added.
I pruned everything that is smaller then 2% of overall runtime, so not all Vertices and Edges are shown, but the numbers itself are unaffected.
The original version had a total runtime of 1.477 s the modified version 1.313 s. For correctness we have to multiply all numbers in the graph of the modified run with a factor of 0.89, because all numbers are relative to the total runtime.

In the first version getitem was called three times for each insert, each time resulting in a call to the key-function. With the modification the calls of getItem reduced to zero. There is still a call of the key function through setItem in both versions (even though the edge is not displayed in the original graph because the connection was below 2%).

The time that the insert function took reduced to 1/3 of what it used to be ((3.47 % / 9.23 %) * 0.89).

The total speedup seems to be much smaller than I originally thought (only 10% in this test). I have to confess that I didn't do extensive tests on the whole program, I just ran it again, so maybe there was some background task in the first run or anything else making me believe that the change improved more then it actually did. Still, I would see this comment as addressed.

Does this all make sense or am I overlooking something? I think you can also just edit my pull request if you have a specific comment in mind that you would like to have placed there.

Cool, I how do you do these graphs?

The time that the insert function took reduced to 1/3 of what it used to be ((3.47 % / 9.23 %) * 0.89).

OK, true, on the other hand, first case: key in utils takes cumulated 12.84%. Second case: 8.47%. This matches with the overall reduction of number of calls to key. Doesn't that mean that reducing the number of calls to key was just about moderately effective overall? I would say so, which means, the part of the comment that says key is expensive remains valid?

Still, I would see this comment as addressed.

I don't know whether it is addressed, in particular because we don't know the setup under which the original comment was made (I suspect with large population size and moderate to large dimension). Also, the comment suggests that (only) half of the time is spent in key which suggests that the modification should speed up the function by less than a factor of two? In such a case, where I don't quite know, I won't see it as addressed until I have better confirmation.

Don't get me wrong, I think the code modification is solid and justified, there nothing wrong with that.

I used cProfile with gprof2dot.

python -m cProfile -o output.prof benchmark.py
python -m gprof2dot -f pstats --root "benchmark:17:run_benchmark" -n 2 -e 2 output.prof > output.dot

Let me know if you have any questions.

Everything you write sounds correct to me. I just not sure what we could really write instead. I think that 3.5% for the insert function of overall runtime is probably just fine. Leaving the comment as it is might send someone else to search for something that they just can't find. I don't really know what I can do to find out if it is resolved on top of what I did so far as I can't reproduce original the numbers (of course I'm more a user of this library than a developer :D). But the question is a bit will someone else in the future be able to do that? If not than the comment might stay forever.

As said if you have anything for that comment in mind I'm happy to edit it in that way.

The key function is potentially something to work on, but I didn't see any easy fix that doesn't cause bigger structural changes.

Just leave the original comment and add a comment on it below that the insert function was change to (try to) address this by removing redundant dictionary accesses with a date when this was done (like Jan 2026).

nikohansen · 2026-01-16T09:12:22Z

Thanks for the pull request! Can you check the above comments?

nikohansen · 2026-01-20T11:42:37Z

Could you squash your commits into a single one? We don't need to see the fixes as separate commits!

nikohansen · 2026-01-20T17:38:00Z

-            if (iteration % 10) < 1:
-                self.truncate(300, iteration - 3)
+        elif value is not None:
+            iteration_tmp = value.get('iteration')


I was in the belief that dictionary access is generally very quick in Python, am I wrong?

You are correct. From a performance perspective the benefit should be neglectable as long as the hash function is cheap (I think hash values for strings are cached).
I usually see this as a safe way of being sure that the map can't change in the meantime and that we don't spend unnecessary computation time if the hash function is expensive.
But well it doesn't really fit into the scheme of what the commit is about so should I remove this change?

Yes, thanks, I think it's better to keep the original version mainly for readability and also for consistency.

Ok, I changed this back to the original version.

nikohansen · 2026-01-20T17:49:20Z

-                iteration = value['iteration']
-            except:
-                pass
+            iteration = value.get('iteration', iteration)


What is wrong with the original code here?

Throwing an exception is quiet expensive, and there is no need for it here. Even if this is not a problem right now it can unexpectedly become one if we run into this path more often. Also we don't check which exception was thrown, so it can be anything. If the user pressed ctrl+c or any other interrupt comes in at this line the program would silently ignore it, and in addition start to show undefined / unexpected behavior from this point on because the iteration variable was not updated, even though it should have been.

Finally I think the second version is much more expressive, as it basically says, take 'iteration' from value with the old iteration-variable as default.

True, the exception should be caught more tightly and not in this a lazy manner. Otherwise I am not sure about your arguments: "Easier to Ask Forgiveness than Permission (EAFP) is the recommended Python practice over checking conditions in advance (LBYL – Look Before You Leap)." Doesn't really matter here.

Hmm, Look Before you leap would be something like

if 'iteration' in value: iteration = value['iteration']

and yes, I agree, there are good reasons to avoid this type of code (for example because concurrency can create unexpected behavior). But the get-function is exactly there so that we don't have to surround every fetch of a dictionary with a try-except block. I also agree with preferring EAFP over LBYL, but this isn’t really a choice between those two idioms, and preferring EAFP over LBYL doesn't mean that we should use EAFP for its own sake.

I think the main reason and use case for the get function is the ability to write val = d.get(k, default), not for writing val = d.get(k, val). The latter code has an execution pass which assigns val = val, which kinda just doesn't smell right. I guess the problem here is that there is no perfect coding solution for this, even though it's such a basic scenario.

nikohansen · 2026-01-21T08:56:55Z

        else:
            if key in self.data_with_same_key:
-                self.data_with_same_key[key] += [self.data[key]]
+                self.data_with_same_key[key].append(self.data[key])


Isn't the only difference that we do not create a new list? I suspect writing [a] is incredibly cheap? You would need to change a lot of code in this package if you insisted in this to be crucial.

Ok, changed this one back, too.

nikohansen · 2026-01-21T09:23:38Z

Thanks for the PR and the discussion!

PaulStahr · 2026-01-21T09:27:35Z

Thank you, too!

nikohansen reviewed Jan 16, 2026

View reviewed changes

PaulStahr force-pushed the development branch from 2171bc7 to e95c88b Compare January 20, 2026 11:49

nikohansen reviewed Jan 20, 2026

View reviewed changes

PaulStahr force-pushed the development branch 3 times, most recently from 4d33563 to 7e88661 Compare January 21, 2026 00:47

nikohansen approved these changes Jan 21, 2026

View reviewed changes

nikohansen reviewed Jan 21, 2026

View reviewed changes

Improve performance by reducing redundant dictionary fetches.

8caaea3

PaulStahr force-pushed the development branch from 7e88661 to 8caaea3 Compare January 21, 2026 09:00

nikohansen merged commit 2eb5afb into CMA-ES:development Jan 21, 2026
7 checks passed

Conversation

PaulStahr commented Jan 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulStahr Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulStahr Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikohansen Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulStahr Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikohansen commented Jan 16, 2026

Uh oh!

nikohansen commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikohansen Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulStahr Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulStahr Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulStahr Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikohansen Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikohansen Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikohansen commented Jan 21, 2026

Uh oh!

PaulStahr commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

PaulStahr Jan 16, 2026 •

edited

Loading

PaulStahr Jan 17, 2026 •

edited

Loading

nikohansen Jan 17, 2026 •

edited

Loading

PaulStahr Jan 19, 2026 •

edited

Loading

nikohansen commented Jan 20, 2026 •

edited

Loading

nikohansen Jan 20, 2026 •

edited

Loading

PaulStahr Jan 20, 2026 •

edited

Loading

PaulStahr Jan 20, 2026 •

edited

Loading

PaulStahr Jan 21, 2026 •

edited

Loading

nikohansen Jan 21, 2026 •

edited

Loading

nikohansen Jan 21, 2026 •

edited

Loading