Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13891

Logistic regression: X has 667 features per sample; expecting 74869

$
0
0

Using a imdb movie reviews dataset i have made a logistic regression to predict the sentiment of the review.

tfidf = TfidfVectorizer(strip_accents=None, lowercase=False, preprocessor=None, tokenizer=fill, use_idf=True, norm='l2', smooth_idf=True)y = df.sentiment.valuesX = tfidf.fit_transform(df.review)X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1, test_size=0.3, shuffle=False)clf = LogisticRegressionCV(cv=5, scoring="accuracy", random_state=1, n_jobs=-1, verbose=3,max_iter=300).fit(X_train, y_train)yhat = clf.predict(X_test)print("accuracy:")print(clf.score(X_test, y_test))model_performance(X_train, y_train, X_test, y_test, clf)

prior to this text preprocessing have been applied.Model performance is just a function to create a confusion matrix.this all works well with a good accuracy.

I now scrape new IMDB reviews:

#The movie "Joker" IMBD review pageurl_link='https://www.imdb.com/title/tt7286456/reviews'html=urlopen(url_link)content_bs=BeautifulSoup(html)JokerReviews = []#All the reviews ends in a div class called text in html, can be found in the imdb source codefor b in content_bs.find_all('div',class_='text'):  JokerReviews.append(b)df = pd.DataFrame.from_records(JokerReviews)df['sentiment'] = "0"jokerData=df[0]jokerData = jokerData.apply(preprocessor)

Problem: Now i wish to test the same logistic regression to predict the sentiment:

tfidf2 = TfidfVectorizer(strip_accents=None, lowercase=False, preprocessor=None, tokenizer=fill, use_idf=True, norm='l2', smooth_idf=True)y = df.sentiment.valuesXjoker = tfidf2.fit_transform(jokerData)yhat = Clf.predict(Xjoker)

But i get the error: ValueError: X has 667 features per sample; expecting 74869

I dont get why it has to have the same amount of features as X_test


Viewing all articles
Browse latest Browse all 13891

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>