Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Corrected Cramer's V results in division by zero when n = r

$
0
0

I recently found this answer which provides the code of an unbiased version of Cramer's V for computing the correlation of two categorical variables:

import scipy.stats as ssdef cramers_corrected_stat(confusion_matrix):""" calculate Cramers V statistic for categorial-categorial association.        uses correction from Bergsma and Wicher,         Journal of the Korean Statistical Society 42 (2013): 323-328"""    chi2 = ss.chi2_contingency(confusion_matrix)[0]    n = confusion_matrix.sum()    phi2 = chi2/n    r,k = confusion_matrix.shape    phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))        rcorr = r - ((r-1)**2)/(n-1)    kcorr = k - ((k-1)**2)/(n-1)    return np.sqrt(phi2corr / min( (kcorr-1), (rcorr-1))))

However, if the number of samples n is equal to the number of categories of the first feature r, then rcorr = n - (n-1) = 1, which yields a division by zero in np.sqrt(phi2corr / min( (kcorr-1), (rcorr-1)) if (kcorr-1) is non-negative. I confirmed this with a simple example:

import pandas as pddata = [    {'name': 'Alice', 'occupation': 'therapist', 'favorite_color': 'red'},    {'name': 'Bob', 'occupation': 'fisherman', 'favorite_color': 'blue'},    {'name': 'Carol', 'occupation': 'scientist', 'favorite_color': 'orange'},    {'name': 'Doug', 'occupation': 'scientist', 'favorite_color': 'red'},    ]df = pd.DataFrame(data) confusion_matrix = pd.crosstab(df['name'], df['occupation']) # n = 4 (number of samples), r = 4 (number of unique names), k = 3 (number of unique occupations)print(cramers_corrected_stat(confusion_matrix))

Output:

/tmp/ipykernel_227998/749514942.py:45: RuntimeWarning: invalid value encountered in scalar divide  return np.sqrt(phi2corr / min( (kcorr-1), (rcorr-1)))nan

Is this expected behavior?

If so, how should I use the corrected Cramer's V in cases where n = k, e.g., when all samples have a unique value for some feature?


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>