Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13981

Identifying Files with Extensions Using Wildcards

$
0
0

After mounting my data lake to Databricks,I encountered an issue while attempting to load all JSON files into a dataframe using *.json. However, upon removing the file extension, the operation was successful.

Not working:

df = spark.read.option("recursiveFileLookup", "true") \    .json("/mnt/adls_gen/prod/**/*.json")

I get below error after executing above code

[PATH_NOT_FOUND] Path does not exist: dbfs:/mnt/adls_gen/prod/**/*.json.

Working

df = spark.read.option("recursiveFileLookup", "true") \    .json("/mnt/adls_gen/prod/**/*")

However, it is also reading other files such as files with extensions like *.json_old and *.txt.

I'm unfamiliar with any alternative options to use in this scenario. Is there another method available for filtering by file extension? My files in the data lake have various extensions, so I'm seeking a solution that accommodates this diversity.

Version of Apache is as below

Apache Spark 3.4.1, Scala 2.12


Viewing all articles
Browse latest Browse all 13981

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>