I have a dataframe of text that looks like this
RepID, Txt1, +83 -193 -380 +55 +9012, -94 +44 +2892 -603, +7010 -3840 +3993
Although the Txt field have +282 and -829 but these are string values not numeric
The problem is that when I use Bag of words function
def BOW(df): CountVec = CountVectorizer() # to use only bigrams ngram_range=(2,2) Count_data = CountVec.fit_transform(df) Count_data = Count_data.astype(np.uint8) cv_dataframe=pd.DataFrame(Count_data.toarray(), columns=CountVec.get_feature_names_out(), index=df.index) # <- HERE return cv_dataframe.astype(np.uint8)
I get the result columns without any sign + or -
the outcome is
RepID 83 193 380 55 ...1 1 1 1 12 0 0 0 0
it should be
RepID +83 -193 -380 +55 ...1 1 1 1 12 0 0 0 0
Why is that and how to fix it?