I was curious about the performance benefits on dictionary lookups when using Python's sys.intern
. As a toy example to measure that, I implemented the following code snippets:
Without interning:
import randomfrom uuid import uuid4keys = [str(uuid4()) for _ in range(1_000_000)]values = [random.random() for _ in range(1_000_000)]my_dict = dict(zip(keys, values))keys_sample = random.choices(keys, k=50_000)def get_values(d, ks): return [d[k] for k in ks]
Test results using ipython:
%timeit get_values(my_dict, keys_sample)8.92 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
With interning:
import sysimport randomfrom uuid import uuid4keys = [sys.intern(str(uuid4())) for _ in range(1_000_000)]values = [random.random() for _ in range(1_000_000)]my_dict = dict(zip(keys, values))keys_sample = random.choices(keys, k=50_000)def get_values(d, ks): return [d[k] for k in ks]
Test results:
%timeit get_values(my_dict, keys_sample)8.83 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So, no meaningful difference between both cases. I tried pumping up the dict size and sampling, but the results stayed on par. Am I using sys.intern
incorrectly? Or is the testing flawed?