Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16595

Vectorizing code to calculate (squared) Mahalanobis Distiance

$
0
0

EDIT 2: this post seems to have been moved from CrossValidated to StackOverflow due to it being mostly about programming, but that means by fancy MathJax doesn't work anymore. Hopefully this is still readable.

Say I want to to calculate the squared Mahalanobis distance between two vectors x and y with covariance matrix S. This is a fairly simple function defined by

M2(x, y; S) = (x - y)^T * S^-1 * (x - y)

With python's numpy package I can do this as

# x, y = numpy.ndarray of shape (n,)# s_inv = numpy.ndarray of shape (n, n)diff = x - yd2 = diff.T.dot(s_inv).dot(diff)

or in R as

diff <- x - yd2 <- t(diff) %*% s_inv %*% diff

In my case, though, I am given

  • m by n matrix X
  • n-dimensional vector mu
  • n by n covariance matrix S

and want to find the m-dimensional vector d such that

d_i = M2(x_i, mu; S)  ( i = 1 .. m )

where x_i is the ith row of X.

This is not difficult to accomplish using a simple loop in python:

d = numpy.zeros((m,))for i in range(m):    diff = x[i,:] - mu    d[i] = diff.T.dot(s_inv).dot(diff)

Of course, given that the outer loop is happening in python instead of in native code in the numpy library means it's not as fast as it could be. $n$ and $m$ are about 3-4 and several hundred thousand respectively and I'm doing this somewhat often in an interactive program so a speedup would be very useful.

Mathematically, the only way I've been able to formulate this using basic matrix operations is

d = diag( X' * S^-1 * X'^T )

where

 x'_i = x_i - mu

which is simple to write a vectorized version of, but this is unfortunately outweighed by the inefficiency of calculating a 10-billion-plus element matrix and only taking the diagonal... I believe this operation should be easily expressible using Einstein notation, and thus could hopefully be evaluated quickly with numpy's einsum function, but I haven't even begun to figure out how that black magic works.

So, I would like to know: is there either a nicer way to formulate this operation mathematically (in terms of simple matrix operations), or could someone suggest some nice vectorized (python or R) code that does this efficiently?

BONUS QUESTION, for the brave

I don't actually want to do this once, I want to do it k ~ 100 times. Given:

  • m by n matrix X

  • k by n matrix U

  • Set of n by n covariance matrices each denoted S_j (j = 1..k)

Find the m by k matrix D such that

D_i,j = M(x_i, u_j; S_j)

Where i = 1..m, j = 1..k, x_i is the ith row of X and u_j is the jth row of U.

I.e., vectorize the following code:

# s_inv is (k x n x n) array containing "stacked" inverses# of covariance matricesd = numpy.zeros( (m, k) )for j in range(k):    for i in range(m):        diff = x[i, :] - u[j, :]        d[i, j] = diff.T.dot(s_inv[j, :, :]).dot(diff)

Viewing all articles
Browse latest Browse all 16595

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>