I need to find a rectangle in a large matrix of integers that has the maximum sum. There is an O(n^3) time algorithm as described here and here for example.
These both work well but they are slow, because of Python partly. How much can the code be sped up for an 800 by 800 matrix for example? It takes 56 seconds on my PC.
Here is my sample code which is based on code from geeksforgeeks:
import numpy as npdef kadane(arr, start, finish, n): # initialize subarray_sum, max_subarray_sum and subarray_sum = 0 max_subarray_sum = float('-inf') i = None # Just some initial value to check # for all negative values case finish = -1 # local variable local_start = 0 for i in range(n): subarray_sum += arr[i] if subarray_sum < 0: subarray_sum = 0 local_start = i + 1 elif subarray_sum > max_subarray_sum: max_subarray_sum = subarray_sum start = local_start finish = i # There is at-least one # non-negative number if finish != -1: return max_subarray_sum, start, finish # Special Case: When all numbers # in arr[] are negative max_subarray_sum = arr[0] start = finish = 0 # Find the maximum element in array for i in range(1, n): if arr[i] > max_subarray_sum: max_subarray_sum = arr[i] start = finish = i return max_subarray_sum, start, finish# The main function that finds maximum subarray_sum rectangle in Mdef findMaxsubarray_sum(M): num_rows, num_cols = M.shape # Variables to store the final output max_subarray_sum, finalLeft = float('-inf'), None finalRight, finalTop, finalBottom = None, None, None left, right, i = None, None, None temp = [None] * num_rows subarray_sum = 0 start = 0 finish = 0 # Set the left column for left in range(num_cols): # Initialize all elements of temp as 0 temp = np.zeros(num_rows, dtype=np.int_) # Set the right column for the left # column set by outer loop for right in range(left, num_cols): temp += M[:num_rows, right] #print(temp, start, finish, num_rows) subarray_sum, start, finish = kadane(temp, start, finish, num_rows) # Compare subarray_sum with maximum subarray_sum so far. # If subarray_sum is more, then update maxsubarray_sum # and other output values if subarray_sum > max_subarray_sum: max_subarray_sum = subarray_sum finalLeft = left finalRight = right finalTop = start finalBottom = finish # final values print("(Top, Left)", "(", finalTop, finalLeft, ")") print("(Bottom, Right)", "(", finalBottom, finalRight, ")") print("Max subarray_sum is:", max_subarray_sum)# np.random.seed(40)square = np.random.randint(-3, 4, (800, 800))# print(square)%timeit findMaxsubarray_sum(square)
Can numba or pythran or parallelization or just better use of numpy be used to speed this up a lot? Ideally I would like it to take under a second.
There is claimed to be a faster algorithm but I don't know how hard it would be to implement.
Test cases
[[ 3 0 2] [-3 -3 -1] [-2 1 -1]]
The correct answer is the rectangle covering the top row with score 5.
[[-1 3 0] [ 0 0 -2] [ 0 2 1]]
The correct answer is the rectangle covering the second column with score 5.
[[ 2 2 -1] [-1 -1 0] [ 3 1 1]]
The correct answer is the rectangle covering the first two columns with score 6.