I am trying to convert a dictionary:data_dict = {'t1': '1', 't2': '2', 't3': '3'}
into a dataframe:
key | value|----------------t1 1t2 2t3 3
To do that, I tried:
schema = StructType([StructField("key", StringType(), True), StructField("value", StringType(), True)])ddf = spark.createDataFrame(data_dict, schema)
But I got the below error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/session.py", line 748, in createDataFrame rdd, schema = self._createFromLocal(map(prepare, data), schema) File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/session.py", line 413, in _createFromLocal data = list(data) File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/session.py", line 730, in prepare verify_func(obj) File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/types.py", line 1389, in verify verify_value(obj) File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/types.py", line 1377, in verify_struct % (obj, type(obj))))TypeError: StructType can not accept object 't1' in type <class 'str'>
So I tried this without specifying any schema but just the column datatypes:ddf = spark.createDataFrame(data_dict, StringType()
& ddf = spark.createDataFrame(data_dict, StringType(), StringType())
But both result in a dataframe with one column which is key of the dictionary as below:
+-----+|value|+-----+|t1 ||t2 ||t3 |+-----+
Could anyone let me know how to convert a dictionary into a spark dataframe in PySpark ?