Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23247

Accessing columns of a numpy ndarray in a vectorized manner

$
0
0

I have a list of lists of indices like so:

outer_list = [[0], [1, 2], [1, 2], [3, 5], [4], [3, 5]]

Each inner list contains the index number of itself within the outer list.

I also have a numpy array with values val_mat. As an example with two rows:

val_mat = np.arange(2*6).reshape(2, 6)

I want to get the row-wise group mean of each index group and store this in a mean-values matrix. In this example, I want (in each row), the element at index 0 to have its value, the elements at index 1 and 2 to have the row-mean of the values at index 1 and 2, and the elements at indices 3 and 5 to have the mean of elements 3 and 5, etc.

At the moment, I am using:

mean_vals = np.vstack([np.nanmean(val_mat[:, inner_list], axis=1) for inner_list in outer_list]).T

However, the outer list is very large (500,000 inner lists) and it takes around 2 minutes for all the mean values to be produced in a 10k x 500k matrix.

Is there a way to do this in a vectorized manner where I don't need to iterate through the outer list?


Viewing all articles
Browse latest Browse all 23247

Trending Articles