I have input data X_train with dimension (477 x 200) and y_train with length 477.I want to use a support vector machine regressor and I am doing grid search.
param_grid = {'kernel': ['poly', 'rbf', 'linear','sigmoid'], 'degree': [2,3,4,5], 'C':[0.01,0.1,0.3,0.5,0.7,1,1.5,2,5,10]}grid = GridSearchCV(estimator=regressor_2, param_grid=param_grid, scoring='neg_root_mean_squared_error', n_jobs=1, cv=3, verbose = 1)grid_result = grid.fit(X_train, y_train))I get for grid_result.best_params_{'C': 0.3, 'degree': 2, 'kernel': 'linear'} with a score of -7.76. And {'C': 10, 'degree': 2, 'kernel': 'rbf'} gives mit -8.0.
However, when I do
regressor_opt = SVR(kernel='linear', 'degree'=2, C=0.3)regressor_opt.fit(X_train,y_train)y_train_pred = regressor_opt.predict(X_train)print("rmse=",np.sqrt(sum(y_train-y_train_pred)**2)/np.shape(y_train_pred)))I get 7.4 and when I do
regressor_2 = SVR(kernel='rbf', 'degree'=2, C=10)regressor_2.fit(X_train,y_train)y_train_pred = regressor_2.predict(X_train)print("rmse=",np.sqrt(sum(y_train-y_train_pred)**2)/np.shape(y_train_pred)))I get 5.9. This is clearly better than 7.4 but in the gridsearch the negative rmse I got for that parameter combination was -8 and therefore worse than 7.4.Can anybody explain to me what is going on? Should I not use scoring='neg_root_mean_square_error'?