Quantcast
Viewing all articles
Browse latest Browse all 14069

Handle NaN values in Scipy.mannwhitneyu (scipy.stats.mannwhitneyu)

I have a dataset of logIC50 values(say A) and another dataset of categorical clinical drug response (say B). There are mainly two important observations in B - sensitive and resistant. But there are many null (NaN) values in B. I want to use one-sided nonparametric Mann Whitney U test to determine whether the estimated log(IC50) values of resistant tumors are significantly larger than sensitive tumors. But I do not know how to handle these NaN values.

I tried to use this code:

from scipy.stats import mannwhitneyup_values = {}  # Dictionary to store p-values for each drug# Assuming logIC50_df contains log IC50 values for different drugs# Iterate through each column (drug) in logIC50_dffor drug in logIC50_df.columns:    # Extract log IC50 values for the current drug    resistant_log_ic50 = logIC50_df[drug][test_dr_new.iloc[:,0] == "Resistant"]    sensitive_log_ic50 = logIC50_df[drug][test_dr_new.iloc[:,0] == "Sensitive"]    # Perform Mann-Whitney U test (one-sided alternative hypothesis)    statistic, p_value = mannwhitneyu(resistant_log_ic50, sensitive_log_ic50, alternative='greater')    # Store p-value for the current drug    p_values[drug] = p_value    # Print individual p-value for each drug    print(f"P-value for {drug}: {p_value}")# Count the number of drugs with statistically significant discriminationsignificance_level = 0.05num_significant = sum(p < significance_level for p in p_values.values())# Print the total number of drugs with statistically significant discriminationprint(f"Number of drugs with statistically significant discrimination: {num_significant}")# Interpretationif any(p < significance_level for p in p_values.values()):    print("Reject the null hypothesis: estimated log(IC50) values are significantly higher in resistant tumors compared to sensitive tumors.")else:    print("Fail to reject the null hypothesis: there is no significant difference in estimated log(IC50) values between resistant and sensitive tumors.")

#But everytime, I use it, the results are bad. I doubt that NaN values in B dataset(i.e test_dr_new) is affecting my results. Please help in this regard.


Viewing all articles
Browse latest Browse all 14069

Trending Articles