I have many arrays of (very) noisy data, and I am trying to fit each with a double gaussian. Some of the arrays don't have nearly a high enough signal/noise to properly fit the data, and I wish to cut those out. Since I am fitting around 2500 arrays, I wish to automate this and not pick by hand. I am having trouble evaluating whether the model has successfully fit the data or not.
I have included the x data and some good data (want to keep) here, and some bad data (want to discard) here. The gaussians I am trying to fit are at x = 0.671644 and x = 0.673081.
I am fitting each array with scipy curve_fit, with this code:
def gaussian2_same_wid(x, *args): amp1, width1, amp2, m, C = args amp1 = amp1 - m*x - C amp2 = amp2 - m*x - C f1 = amp1 * np.exp(-1*((x - 0.671644)**2) / (2*width1**2)) f2 = amp2 * np.exp(-1*((x - 0.673081)**2) / (2*width1**2)) return f1 + f2 + m*x + Cpopt_same, pcov_same = curve_fit(gaussian2_same_wid, xdata=x, ydata=y, sigma=dy, p0 = guess_same, bounds=bounds_same, maxfev= 1000000)The parameters for bounds and guesses are given by
guess_same = (max_SII_1, 0.00045, max_SII_2, 0, 0.4)bounds_same = [[max_SII_1-0.1, 0, max_SII_2-0.3, -5, 0], [max_SII_1+0.1, 6, max_SII_2 + 0.1, 5, 3]]where max_SII_1 and max_SII_2 are given by the max value between 0.6709-0.67275, and 0.67275-0.675 respectively.
My issue is that I cannot figure out a way to automatically evaluate how good these models are fitting the data (strictly the gaussians, the noise of course does not matter for the fit) reliably. I have tried to use chisquare(), however I either get an error such as
ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:0.011803918088555288or I get a p-value of 1 (some are ~0, but it is very unreliable), even with a bad fit. Is there a better way to evaluate the fit?