Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16566

How to do transformations on a json string in a streaming dataframe?

$
0
0

I've got three kinesis streams feeding into a databricks notebooks, I've got the notebook reading each of the dataframes fine and have each dataframe displaying with

display(kinesis_geo_df)

Next to see the data contained in my stream, I explicitly deserialize the data column of the dataframe by running the following command:

kinesis_geo_df = kinesis_geo_df.selectExpr("CAST(data as STRING)")

Next I would like to perform some transformations on this json string before writing it to its individual delta tables

However i'm unable to figure out how to use the pyspark functions to achieve this, from my understanding i can use the from_json function by providing it a schema and this schema can be got from collecting the first instance on the column and using that schema for everything however i keep running into streaming data errors where it says it would have too be first written to run non-streaming functions.

Just wondering how i can transform the data prior to writing it? i.e access the data column explode it into its individuals columns, apply some transformation and then write it to the associated streaming tables

Thank you


Viewing all articles
Browse latest Browse all 16566

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>