Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Fast way to convert 2d string numpy array to a 3d int array

$
0
0

I have a very large numpy array with entries like:

[['0/1''2/0']['3/0''1/4']]

I want to convert it/ get an array with the 3d array like

[[[0 1] [2 0]][[3 0] [1 4]]]

The array is very wide, so a lot of columns, but not many rows. And there are around 100 or so possibilities for the string. This isnt actually a fraction, just a demonstration of what is in the file (its genomics data, given to me in this format).

I don't want to run in parallel, as I will be running this on a single CPU before moving to a single GPU, so the extra CPUs would be idle while the GPU kernel is running.I have tried numba:

import numpy as npimport itertoolsfrom numba import njitimport time@njit(nopython=True)def index_with_numba(data,int_data,indices):     for pos in indices:           str_match = str(pos[0])+'/'+str(pos[1])        for i in range(data.shape[0]):            for j in range(data.shape[1]):                if data[i, j] == str_match:                    int_data[i,j] = pos    return int_datadef generate_masks():    masks=[]    def _2d_array(i,j):        return np.asarray([i,j],dtype=np.int32)    for i in range(10):        for j in range(10):            masks.append(_2d_array(i,j))    return masksrows = 100000cols = 200numerators = np.random.randint(0, 10, size=(rows,cols))denominators = np.random.randint(0, 10, size=(rows,cols))samples =  np.array([f"{numerator}/{denominator}" for numerator, denominator in zip(numerators.flatten(), denominators.flatten())],dtype=str).reshape(rows, cols)samples_int = np.empty((samples.shape[0],samples.shape[1],2),dtype=np.int32)# Generate all possible masksmasks = generate_masks()t0=time.time()samples_int = index_with_numba(samples,samples_int, masks)t1=time.time()print(f"Time to index {t1-t0}")

But it is too slow to be feasible.

Time to index 182.0304057598114

The reason I want this is I want to write a cuda kernel to perform an operation based on the original values - so for '0/1' i need 0 and 1 etc, but I cannot handle the strings. I had thought perhaps masks could be used, but they dont seem to be suitable.

Any suggestions appreciated.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>