Accessing HDFS Data with Optimized Locality

I am wondering, how to make sure HDFS data accessing makes the best use of the local replication to minimize the use of network transfer.

I am hosting HDFS on 3 machines and replication is set to 3. Let's name them machine A, B, C. Machine A is the namenode, and these 3 are all datanode.

Currently, I am reading data like the following code

# Run this code on machine A, B, C separatelyimport fsspecimport pandas as pdwith fsspec.open('hdfs://machine_A_ip:9000/path/to/data.parquet', 'rb') as fp:    df = pd.read_parquet(fp)

And I observed I have a huge traffic of network connection 100+MB/s upload and download. No matter I run on which machine (namenode or not).

I also tried hosting Dask and Ray cluster on the same machine set.But I think Dask is not supporting this feature: Does Dask communicate with HDFS to optimize for data locality? - Stack Overflow

I haven't found clues in the documentations

API Reference — fsspec: Wraps pyarrow
pyarrow.fs.HadoopFileSystem — Apache Arrow
Filesystem Interface — Apache Arrow
Issues · apache/arrow: Seems no dicussion about locality

Accessing HDFS Data with Optimized Locality

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...