How to use a pandas series of arrays, each containing dataframe indices, to operate on that dataframe

This is from the 1990 California Housing Dataset used in Geron's Hands-On Machine Learning. That might be helpful context, but this is more of a pandas/numpy question. I have a solution, but I'm wondering if there's a better one, as mine feels pretty inelegant.

There are 3 pieces of data involved, each with 16,512 rows. The first is the latitude and longitude of the districts each house is located within:

lat_longs = housing.iloc[:, :2]lat_longs.head()

index	longitude	latitude
13096	-122.42	37.8
14973	-118.38	34.14
3785	-121.98	38.36
14689	-117.11	33.75
20507	-118.15	33.77

The second set is the prices associated with each of those houses:

housing_labels.head()

index	median_house_value
13096	458300.0
14973	483800.0
3785	101700.0
14689	96100.0
20507	361800.0

The third is an array, 16,512 x 5. Each row contains the indices of the 5 closest houses to the house referenced by that row. (I wrapped it in a dataframe to make it easier to display in markdown, but it's a numpy array.)

idx[:5]

index	0	1	2	3	4
0	3059	1266	8382	8461	1138
1	5608	1	11080	9372	13446
2	5394	2	3101	14696	2497
3	3	14935	13839	11401	14826
4	5016	5510	4	708	11889

My goal is to get the median house price of the five closest houses. My solution was the below

pd.Series(list(idx)).apply(lambda x: np.median(housing_labels.iloc[x]))

index	0
0	500001.0
1	386700.0
2	111500.0
3	96100.0
4	306300.0

(The x in the lambda above is all 5 indices for that row.) Like I said, it worked, but I'm wondering if there's a better, faster (I'm under the impression apply is slow) and/or more elegant solution than what I came up with?

This pattern of having a series of arrays, where each array has indices that match a criteria of interest to that particular row, seems like a common pattern in data science that I'd love to have a better solution for. Any ideas?

-Joe

Was asked for a reproducible example. When I tried the solution suggested below, these small dataframes and arrays provided the correct answer.I tried this and it worked. For housing_labels:|index|median_house_value||---|---||13096|458300.0||14973|483800.0||3785|101700.0||14689|96100.0||20507|361800.0||1286|92600.0||18078|349300.0||4396|440900.0||18031|160100.0||6753|183900.0|

For idx:

idx = np.array([ [13096, 20507, 4396],          [6753, 3785, 14973],           [14689, 18078, 18031],           [14973, 20507, 1286]])

Correct output:

[440900. 183900. 160100. 361800.]

How to use a pandas series of arrays, each containing dataframe indices, to operate on that dataframe

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...