I want to create a chart where predicted values are on the X axis and actual values are on the Y axis, with a scatter plot of points that also has a density plot weighted by volume. The observations also have a "units" variable associated with them. I want to create a density plot or heat map of those points but want to use the volume variable to weight the points for purpose of creating the contours / colors. I have this code below, but when I run it I get a warning indicating that it didn't actually use the weights. It creates the exact chart I want, but doesn't use the weights to create the shapes.
import matplotlib.pyplot as pltimport numpy as npimport pandas as pdimport seaborn as sns#random data for purposes of post -- but in real world I have an actual dataframe hereX = np.random.rand(100000)Y = np.random.rand(100000)units = np.random.rand(100000)# Combine X, Y, and units into a DataFramekde_data = pd.DataFrame({'X': X,'Y': Y,'units': units})# Drop rows with NaN valueskde_data.dropna(inplace=True)# Check if there is sufficient dataif len(kde_data) < 3: print("Insufficient data to create the density plot.")else: # Create a KDE plot plt.figure(figsize=(10, 8)) sns.kdeplot(data=kde_data[['X', 'Y']], weights=kde_data['units'], fill=True, cmap='viridis') plt.title('Density Plot with units as Weights') plt.xlabel('X') plt.ylabel('Y') plt.show()
and get this warning along with the plot:
The following kwargs were not used by contour: 'weights', 'fill'
I would just duplicate the observations by a factor of the weight, but would run into serious compute resources shortages -- way too many observations and weights are in millions in some cases.