I have a dataset partitioned by date which consist of one thousand of arrow files with 300 columns. I tried to read it with pl.read_ipc and concat it with rechunk=True. The rechunk step took more than half an hour on my server with 1TB RAM. Is there a way to accelerate this procedure or any suggestions for processing or storage in this scenario? Many thanks for any help and effort in advance.
↧