Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13861

Different ValueError each time I run

$
0
0

I am a beginner to ML with Python Pandas and SciKit and this is my first project. I have a dataset containing some known information about the passengers aboard the Titanic in a CSV file with 893 lines that looks like this:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked1,0,3,"Braund, Mr. Owen Harris",0,22,1,0,A/5 21171,7.25,,S2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",1,38,1,0,PC 17599,71.2833,C85,C3,1,3,"Heikkinen, Miss. Laina",1,26,0,0,STON/O2. 3101282,7.925,,S4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",1,35,1,0,113803,53.1,C123,S

Based on this data, I'm trying to build a program that given similar data, can tell me whether the passenger would have survived. My code looks like this:

import pandas as pdfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_scoredata = pd.read_csv('data/train.csv')X = data.drop(columns=["PassengerId", "Survived", "Name"])print(X)y = data["Survived"]print(y)X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.9)model = DecisionTreeClassifier()model.fit(X_train, y_train)predictions = model.predict(X_test)print(X_test)print(predictions)score = accuracy_score(y_test, predictions)print(score*100, "% accuracy.") 

However, I keep running into the same ValueError, with a random line within the dataset. Something like this:

Traceback (most recent call last):  File "/home/runner/MachineLearning1/main.py", line 14, in <module>    model.fit(X_train, y_train)  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/base.py", line 1474, in wrapper    return fit_method(estimator, *args, **kwargs)  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/tree/_classes.py", line 1009, in fit    super()._fit(  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/tree/_classes.py", line 252, in _fit    X, y = self._validate_data(  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/base.py", line 645, in _validate_data    X = check_array(X, input_name="X", **check_X_params)  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/utils/validation.py", line 997, in check_array    array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/utils/_array_api.py", line 521, in _asarray_with_order    array = numpy.asarray(array, order=order, dtype=dtype)  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/pandas/core/generic.py", line 2150, in __array__    arr = np.asarray(values, dtype=dtype)ValueError: could not convert string to float: 'Hassab, Mr. Hammad'

Or

Traceback (most recent call last):  File "/home/runner/MachineLearning1/main.py", line 14, in <module>    model.fit(X_train, y_train)  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/base.py", line 1474, in wrapper    return fit_method(estimator, *args, **kwargs)  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/tree/_classes.py", line 1009, in fit    super()._fit(  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/tree/_classes.py", line 252, in _fit    X, y = self._validate_data(  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/base.py", line 645, in _validate_data    X = check_array(X, input_name="X", **check_X_params)  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/utils/validation.py", line 997, in check_array    array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/sklearn/utils/_array_api.py", line 521, in _asarray_with_order    array = numpy.asarray(array, order=order, dtype=dtype)  File "/home/runner/MachineLearning1/.pythonlibs/lib/python3.10/site-packages/pandas/core/generic.py", line 2150, in __array__    arr = np.asarray(values, dtype=dtype)ValueError: could not convert string to float: 'Tornquist, Mr. William Henry'

Where in both cases, the error is the same, but the name that the error is attributed to is different each and every time.

What I'm expecting is the data given and the predictions both printed in the console as well as the accuracy. My question is, why am I getting this error? (I've deduced the randomness is due to no random_state parameter being specified)


Viewing all articles
Browse latest Browse all 13861

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>