I have a dataset of logIC50 values(say A) and another dataset of categorical clinical drug response (say B). There are mainly two important observations in B - sensitive and resistant. But there are many null (NaN) values in B. I want to use one-sided nonparametric Mann Whitney U test to determine whether the estimated log(IC50) values of resistant tumors are significantly larger than sensitive tumors. But I do not know how to handle these NaN values.
I tried to use this code:
from scipy.stats import mannwhitneyup_values = {} # Dictionary to store p-values for each drug# Assuming logIC50_df contains log IC50 values for different drugs# Iterate through each column (drug) in logIC50_dffor drug in logIC50_df.columns: # Extract log IC50 values for the current drug resistant_log_ic50 = logIC50_df[drug][test_dr_new.iloc[:,0] == "Resistant"] sensitive_log_ic50 = logIC50_df[drug][test_dr_new.iloc[:,0] == "Sensitive"] # Perform Mann-Whitney U test (one-sided alternative hypothesis) statistic, p_value = mannwhitneyu(resistant_log_ic50, sensitive_log_ic50, alternative='greater') # Store p-value for the current drug p_values[drug] = p_value # Print individual p-value for each drug print(f"P-value for {drug}: {p_value}")# Count the number of drugs with statistically significant discriminationsignificance_level = 0.05num_significant = sum(p < significance_level for p in p_values.values())# Print the total number of drugs with statistically significant discriminationprint(f"Number of drugs with statistically significant discrimination: {num_significant}")# Interpretationif any(p < significance_level for p in p_values.values()): print("Reject the null hypothesis: estimated log(IC50) values are significantly higher in resistant tumors compared to sensitive tumors.")else: print("Fail to reject the null hypothesis: there is no significant difference in estimated log(IC50) values between resistant and sensitive tumors.")
#But everytime, I use it, the results are bad. I doubt that NaN values in B dataset(i.e test_dr_new) is affecting my results. Please help in this regard.