I have an unbalanced dataframe on spark using PySpark. I want to resample it to make it balanced. I only find the sample function in PySpark
sample(withReplacement, fraction, seed=None)but I want to sample the dataframe with weight of unitvolumein Python, I can do it like
df.sample(n,Flase,weights=log(unitvolume))is there any method I could do the same using PySpark?