Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23160

Polars lazyframe not returning specified schema order after collecting

$
0
0

I have a function which runs in a loop performing calculations on a list of arrays.

At a point during the first iteration of the function, a polars lazyframe is initialised.

On the following iterations, a new dataframe is specified using the same schema, and the two dataframes are joined row-wise using pl.vstack, and then specified as a lazyframe again.

import numpy as npimport polars as pldef my_func():  array_list = [np.zeros((1,19))]*2 #this is just for example and not representative of shape of real array.  for i, _ in enumerate(array_list):      #calculations are done here      result = np.zeros((1,19)) #result of calculations (correct shape of real result)      if i < 1:          result_df = pl.DataFrame(data = result,                       schema = {'MDL','MVL','MWVL','RR','DET','ADL','LDL','DIV','EDL','LAM','TT','LVL','EVL','AWVL','LWVL','LWVLI','EWVL','Ratio_DRR','Ratio_LD'},                        orient='row').lazy()       else:          new_df = pl.DataFrame(data=data,                              schema = {'MDL','MVL','MWVL','RR','DET','ADL','LDL','DIV','EDL','LAM','TT','LVL','EVL','AWVL','LWVL','LWVLI','EWVL','Ratio_DRR','Ratio_LD'                                  },                                  orient='row')          #append new dataframe to results          result_df = result_df.collect().vstack(new_df, in_place=True).lazy()return result_df

When returning the dataframe outside the function, the column names are no longer in order, but the data is.

e.g

result.schemaOrderedDict([('LAM', Float64),             ('LDL', Float64),             ('ADL', Float64),             ('DIV', Float64),             ('MDL', Float64),             ('MWVL', Float64),             ('LWVL', Float64),             ('MVL', Float64),             ('TT', Float64),             ('DET', Float64),             ('RR', Float64),             ('EDL', Float64),             ('Ratio_LD', Float64),             ('Ratio_DRR', Float64),             ('LVL', Float64),             ('LWVLI', Float64),             ('EWVL', Float64),             ('EVL', Float64),             ('AWVL', Float64)])

I assume this is due to my naivety about how lazyframes work, but is there a way to enfore the order without renaming the columns?

Thanks.


Viewing all articles
Browse latest Browse all 23160

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>