I've got three kinesis streams feeding into a databricks notebooks, I've got the notebook reading each of the dataframes fine and have each dataframe displaying with
display(kinesis_geo_df)
Next to see the data contained in my stream, I explicitly deserialize the data column of the dataframe by running the following command:
kinesis_geo_df = kinesis_geo_df.selectExpr("CAST(data as STRING)")
Next I would like to perform some transformations on this json string before writing it to its individual delta tables
However i'm unable to figure out how to use the pyspark functions to achieve this, from my understanding i can use the from_json function by providing it a schema and this schema can be got from collecting the first instance on the column and using that schema for everything however i keep running into streaming data errors where it says it would have too be first written to run non-streaming functions.
Just wondering how i can transform the data prior to writing it? i.e access the data column explode it into its individuals columns, apply some transformation and then write it to the associated streaming tables
Thank you