Im working on doing some image processing on a very large image 25 088 pixels by 36 864 pixels big. since the image is very large i do the image processing by 256x256 pixel 'tiles'. I noticed that on my windows task manager neither my CPU,RAM,GPU, or SSD reaches 50% utilization when running my function. This lead me to believe that there is performance i can squeeze out somehow.
def processImage(self, img, tileSize = 256, numberOfThreads = 8): # a function within a class height, width, depth = img.shape print(height,width,depth,img.dtype) #create a duplicate but empty matrix same as the img processedImage = np.zeros((height,width,3), dtype=np.uint8) #calculate left and top offsets leftExcessPixels = int((width%tileSize)/2) topExcessPixels = int((height%tileSize)/2) #calculate the number of tiles columns(X) and row(Y) XNumberOfTiles = int(width/tileSize) YNumberOfTiles = int(height/tileSize) # for y in range(YNumberOfTiles): for x in range(XNumberOfTiles): XStart = (leftExcessPixels + (tileSize * x)) YStart = (topExcessPixels + (tileSize * y)) XEnd = XStart + tileSize YEnd = YStart + tileSize croppedImage = img[YStart:YEnd, XStart:XEnd] print('Y: '+ str(y) +' X: '+ str(x),end=" ") #process the cropped images and store it on the same location on the empty image processedImage[YStart:YEnd, XStart:XEnd] = self.doSomeImageProcessing(croppedImage) Multi-threading seems to be the solution where i parallelize the processing of 'tiles'. since processing of the tiles are independent to each other, there should be no problem working on multiple tiles at the same time. What im not sure on how to do though is that the resulting matrix from self.doSomeImageProcessing(croppedImage) should be placed back to the same coordinates but on a different variable named processedImage. Im worried that since there are multiple threads and all of them are trying to write to the processedImage image variable python might not like that so much, any ideas on how to approach it?