Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16595

Since Spark 2.3, the queries from raw JSON/CSV files are disallowed

$
0
0

Hi i am learning pyspark now and it currently working for the csv data but if i convert it into json data i am getting an error

*Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when thereferenced columns only include the internal corrupt record column(named _corrupt_record by default). For example:

spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()

and spark.read.schema(schema).json(file).select("_corrupt_record").show().Instead, you can cache or save the parsed results and then send the same query.For example, val df = spark.read.schema(schema).json(file).cache() and then

df.filter($"_corrupt_record".isNotNull).count().*

the sample json data is

[  {"student_id": 1,"name": "John Doe","age": 18,"grade": "A"  },  {"student_id": 2,"name": "Jane Smith","age": 17,"grade": "B"  },  {"student_id": 3,"name": "Bob Johnson","age": 19,"grade": "C"  },  {"student_id": 4,"name": "Alice Williams","age": 18,"grade": "A"  },  {"student_id": 5,"name": "Charlie Brown","age": 17,"grade": "B"  },  {"student_id": 6,"name": "Emma Davis","age": 19,"grade": "C"  },  {"student_id": 7,"name": "James Miller","age": 18,"grade": "A"  },  {"student_id": 8,"name": "Sophie Taylor","age": 17,"grade": "B"  },  {"student_id": 9,"name": "David White","age": 19,"grade": "C"  }]

and the python code that i have used is

mydata = spark.read.json("/original.csv")mydata.show()

Viewing all articles
Browse latest Browse all 16595

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>