Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13891

Airflow marking zombie jobs when i execute a pd.read on a big file

$
0
0

I am currently working on a project using airflow, docker and python.I'm trying to load a .parquet file into my postgres local db, my function reads a .parquet file from a temp folder, transforms this into a df and uploads to postgres the 2 columns i need for the calculation, however, when i initialize the df airflow tags immediately my task as a zombie task, my code is as follows: def file_upload(filename, engine): try: filepath = os.path.join(SOURCE_FILE_PATH, filename) df = pd.read_parquet('/opt/airflow/dags/files/yellow_tripdata_2011-02.parquet') source_data_df = df[['tpep_pickup_datetime','tpep_dropoff_datetime']].copy() with engine.connect() as conn: source_data_df.to_sql( name="yellowcab_data", con=conn, if_exists="append", index=False ) print("file written into db: ", filepath) os.remove(filepath) return None except Exception as e: print(e) return NoneEvery time the code goes to: 'df = pd.read_parquet('/opt/airflow/dags/files/yellow_tripdata_2011-02.parquet')' airflow automatically kills the task and labels it as a zombie task. I am currently running airflow from the docker-compose.yaml file of this link: https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yamlPlease help! The task is dying every time my dag runs and i suspect it will die when i upload the data to posgres as well, i'm kind of a newbie in the airflow config


Viewing all articles
Browse latest Browse all 13891

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>