Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13861

Low score when applying RandomForestRegressor on a numeric-categorical mixed dataframe to predict on a two-column label set

$
0
0

I was using this insurance dataset on Kaggle insurance dataset to try to build a simple regressor to predict on the final two columns ['coverage_level','charges'], while using all the other 10 columns as features to feed into the regressor model.

I was aware that the 10 columns to be used as features are of both numeric and categorical type, therefore I did some transformation using LabelEncoder:

df2 = df.copy()# gengerle = LabelEncoder()le.fit(df2.gender.drop_duplicates())df2.gender = le.transform(df2.gender)... so forth for the remaing categorical columns such as 'smoker','region' etc.

Then I applied a minmax scaler on the transformed dataframe:

inputs = df2[["age", "gender", "bmi", "children", "smoker", "region", "medical_history", "family_medical_history","exercise_frequency","occupation"]]targets = df2[["coverage_level", "charges"]]scaler = MinMaxScaler()scaledInputs = np.array(scaler.fit_transform(inputs))X_train, X_test, y_train, y_test = train_test_split(scaledInputs, targets,test_size=0.20, random_state = 42)

Finally is the training and testing part:

rf_model = RandomForestRegressor(n_estimators=10, random_state=42)# Fit the training setsrf_model.fit(X_train, y_train)rf_outputs = rf_model.predict(X_test)rf_mse = mean_squared_error(y_test, rf_outputs)rf_score = rf_model.score(X_test, y_test)

However, the performance is very low, with a score 0.27 and a mse nearly 2615601.

I tried some fixes. The first one is instead of only scaling the inputs, I scaled the two target columns ['coverage_level','charges'], as well before feeding, however, it does not help at all. The second fix is to use one-hot encoding instead of label encoding, but still no gain.

How can I look into this problem?


Viewing all articles
Browse latest Browse all 13861

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>