Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13921

Download multiple files concurrently

$
0
0

I'm using fsspec to interact with remote filesystems, in my case its GCS, but I believe the solution would be general.

For a single file, I'm using the following code(if you need the helper function code, it's here)

def open_any_file(filepath: str, mode: str = "r", **kwargs) -> t.Generator[t.IO, None, None]:"""    Open file and close it after use. Works for local, remote, http, https, s3, gcs, etc.    :param filepath: Filepath.    :param mode: Mode.    :param kwargs: Keyword arguments.    :return: File object."""    protocol, path = get_protocol_and_path(filepath)    filepath = PurePosixPath(path)    filesystem = fsspec.filesystem(protocol)    load_path = get_filepath_str(filepath, protocol)    # Figure out content type    if "content_type" not in kwargs and filepath.suffix == ".json":        kwargs["content_type"] = "application/json"    with filesystem.open(load_path, mode=mode, **kwargs) as f:        yield f

Assuming I have a thousand JSONs to download, what would be the most efficient way to do so? Should I go for parallelization? threading? Async?

What would be the optimal choice in terms of execution-time, and what would be the implementation for it?


Viewing all articles
Browse latest Browse all 13921

Trending Articles