I am working on a project, and have chosen to try to learn some DataFrame because I am new to it.
I have an interesting problem that you people may be able to help me with, as the DataFrame seems fairly powerful and has some neat capabilities.
I have 2 DataFrames. The first is has a List of Dicts of values. The second has no data, but has a list of columns that are integers.
data1 = [{'Start': 51, 'End': 55},{'Start':24, 'End':37},{'Start':89,'End':122},{'Start':44, 'End':31}, {'Start':77, 'End':50}, {'Start':10, 'End':9}]dfm1 = pd.DataFrame.from_dict(data1)data2 = [-40, -30, -20, -10, 0, 10, 20, 30, 40]dfm2 = pd.DataFrame([], columns=data2)Let's assume that data1 has 500 data points. You get the idea...
My goal is that I want a tally of ranges of the differences in dfm1 based upon a variable sliding window size and I want that tally to exist in dfm2.
So the interesting thing I want to do is to cleverly create a sliding window of calculation of the difference between data1[index + window] - data1[index]. Then, based upon that difference between the values at the 2 indexes, I want to add a tally to dfm2 if it is less than or equal to the dfm1 column value, but not less than the column-1 value. So, we would assume that, in my example, column -40 would never ever have a tally greater than 0.
My desired date output, for let's say the dfm1 values I provided, and we are tallying Start values, and a window of size 2 would be (for dfm2):
[0, 1, 1, 0, 0, 0, 1, 0, 1]This would be performing 51-89 = -38, 24-44 = -20, 89-77=12, 44-10=34 for a window size of 2. A window size of 3 would be 51-44, 24-77, and 89-10...
The cheap and easy way is obviously for me to iterate and create tallies. But I know that DataFrame has some cool and sexy functions like rolling which may work really well for this.
As a final boss mode question, what if I wanted to do this same rolling tally, but rather than subtracting Start from Start, what if I wanted to subtract an index's Start from its same End, and then perform that rolling tally based upon the difference from window_size away?
An Easter egg boss mode question would be: what if I didn't preset the column names in dfm2, and I let them be auto added as new tallies are discovered? Say in ranges of 10?