I'm trying to read a big feather file and my process get's killed:
gerardo@hal9000:~/Projects/Morriello$ du -h EtOH.feather 5.3G EtOH.feather
I'm using pandas and pyarrow, here are the versions
gerardo@hal9000:~/Projects/Morriello$ pip freeze | grep "pandas\|pyarrow"pandas==2.2.1pyarrow==15.0.0
When I try to load the dataset into a dataframe I just get the process killed:
In [1]: import pandas as pdIn [2]: df = pd.read_feather("EtOH.feather", dtype_backend='pyarrow')Killed
I'm on linux and I'm using Python 3.12
, on a machine with 16Gb of RAM.
I saw the process get's killed due to an Out Of Memory error.
Out of memory: Killed process 918058 (ipython) total-vm:24890996kB, anon-rss:8455548kB, file-rss:640kB, shmem-rss:0kB, UID:1000 pgtables:17228kB oom_score_adj:100
How do I read the file in this case? And if I manage to read it, would there been noticeable advantages in converting it to a parquet
format?