Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13951

Partitioning Parquet AWS Wrangler with LakeFs

$
0
0

I was trying to partition the parquet on S3 and it worked with AWS Wrangler.

basename_template = 'part.'partitioning = ['cust_id', 'file_name', 'added_year', 'added_month', 'added_date']loop = asyncio.get_event_loop()s3_path = "s3://customer-data-lake/main/parquet_data"await loop.run_in_executor(None, lambda: wr.s3.to_parquet(   df=batch.to_pandas() ,   path=s3_path,   dataset=True,   max_rows_by_file=MAX_ROWS_PER_FILE,   use_threads=True,   partition_cols = partitioning,   mode='append',   boto3_session=s3_session,   filename_prefix=basename_template ))

Then I tried to convert it to lakeFs, I changed the endpoint to LakeFS

wr.config.s3_endpoint_url = lakefsEndPoint

Then suddenly partitioning was not working anymore. It just appends to the same partition.

This image is the original S3 oneenter image description here

Then this is after I changed to lakeFsenter image description here

It just appends to the csv_1. What am I doing wrong here?


Viewing all articles
Browse latest Browse all 13951

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>