Quantcast
Viewing all articles
Browse latest Browse all 14069

How to make a histogram based on the second column of a large matrix and then calculate the sum efficiently?

The matrix is like 4000 columns x 1e8 lines. Now I need to divide the 1e8 lines into 1000 bins based on the 2nd column and sum the lines in every bin. Finally, I would get 4000 columns x 1000 lines.

aa=np.random.rand(10000000, 4000)l,r=min(aa[:,1]),max(aa[:,1])bins=np.linspace(l,r,1001)for bn in range(1000):      ll,rr=bins[bn],bins[bn+1]      np.sum(dd[(aa[:,1]>ll) &( aa[:,1]< rr) ],axis=0)' # one of the 1000 lines

I find the speed of the final step above is low.

Then I need to stack all the 1000 lines to generate the 1000 x 4000 matrix.


Viewing all articles
Browse latest Browse all 14069

Trending Articles