I am building a data transformation system which consists of transformation building blocks that applies a change to a piece of XML that in this phase is deserialised into a regular Python object. Considering the huge number of objects that get processed we have to take into account both memory and cpu effects. Given the cost of serialisation and deserialisation doing all operations on a specific object type (when deserialised) is paramount.
Most of the transformations are standard, they apply on all datasets. For some data sets, some objects require custom attributes that should be set. We would like to bring this to the user as user-defined-functions, which is applied at the inner loop of processing. For the user defined functions, the user should get the ability to execute partial(f, ..)'s on the data that flows into these building blocks. It would be nice if the dynamic functions could theoretically be executed in parallel, but I gave up on this for now. For extensions I have also considered subclassing, but I feel that that does not really help when this is brought into the "5GL domain" when the application is presented as visual building blocks towards the user.
In order to benchmark the different methods I wrote a benchmark of different variants of the execution.
import timeitclass X: a: int b: int c: intdef a(x: X): x.a = 1def b(x: X): x.b = 1def c(x: X): x.c = 1def abc(x: X): x.a = 1 x.b = 1 x.c = 1def forloop1(): L = [X() for y in range(0, 1000000)] for l in L: a(l) b(l) c(l)def forloop2(): L = [X() for y in range(0, 1000000)] for l in L: abc(l)def forloop3(): L = [X() for y in range(0, 1000000)] D = [a, b, c] for l in L: for d in D: d(l)def forloop4(): L = [X() for y in range(0, 1000000)] D = [a, b, c] for l in L: any(map(lambda x: x(l), D))print(timeit.timeit(lambda: forloop1(), number=100)) print(timeit.timeit(lambda: forloop2(), number=100)) print(timeit.timeit(lambda: forloop3(), number=100)) print(timeit.timeit(lambda: forloop4(), number=100))The output of the above on an i7-8700 with python 3.12.3 (multiple runs).
35.1741500470000130.88001471699999541.6295731349999966.74359317000005The output of the above on an i7-8700 with python 3.10.12 (multiple runs).
31.62425517300016523.56788578400005532.26234032400020654.54437262800002I am puzzled why python 3.12.3 has the worst performance. But also surprised that a forloop3 out performs the any(map(lambda variant.
Considering the above, forloop2 executes a single function. Would there be an elegant way (for example with ast) to inline all the code from a, b, c, so it would dynamically form abc? Are there better alternatives?