I have a list of lists of indices like so:
outer_list = [[0], [1, 2], [1, 2], [3, 5], [4], [3, 5]]Each inner list contains the index number of itself within the outer list.
I also have a numpy array with values val_mat. As an example with two rows:
val_mat = np.arange(2*6).reshape(2, 6)I want to get the row-wise group mean of each index group and store this in a mean-values matrix. In this example, I want (in each row), the element at index 0 to have its value, the elements at index 1 and 2 to have the row-mean of the values at index 1 and 2, and the elements at indices 3 and 5 to have the mean of elements 3 and 5, etc.
At the moment, I am using:
mean_vals = np.vstack([np.nanmean(val_mat[:, inner_list], axis=1) for inner_list in outer_list]).THowever, the outer list is very large (500,000 inner lists) and it takes around 2 minutes for all the mean values to be produced in a 10k x 500k matrix.
Is there a way to do this in a vectorized manner where I don't need to iterate through the outer list?