I'm using fsspec to interact with remote filesystems, in my case its GCS, but I believe the solution would be general.
For a single file, I'm using the following code(if you need the helper function code, it's here)
def open_any_file(filepath: str, mode: str = "r", **kwargs) -> t.Generator[t.IO, None, None]:""" Open file and close it after use. Works for local, remote, http, https, s3, gcs, etc. :param filepath: Filepath. :param mode: Mode. :param kwargs: Keyword arguments. :return: File object.""" protocol, path = get_protocol_and_path(filepath) filepath = PurePosixPath(path) filesystem = fsspec.filesystem(protocol) load_path = get_filepath_str(filepath, protocol) # Figure out content type if "content_type" not in kwargs and filepath.suffix == ".json": kwargs["content_type"] = "application/json" with filesystem.open(load_path, mode=mode, **kwargs) as f: yield f
Assuming I have a thousand JSONs to download, what would be the most efficient way to do so? Should I go for parallelization? threading? Async?
What would be the optimal choice in terms of execution-time, and what would be the implementation for it?