Bag of Words with Negative Words in Python

I have this document

It is not normal text

It is a text of Scientific terminologies

The text of these documents are like this

RepID,Txt1,K9G3P9 4H477 -Q207KL41 98464 ... Q207KL412,D84T8X4 -D9W4S2 -D9W4S2 8E8E65 ... D9W4S2 3,-05L8NJ38 K2DD949 0W28DZ48 207441 ... K2D28K84

I can build a feature set using BOW algorithm

Here is my code

def BOW(df):  CountVec = CountVectorizer() # to use only  bigrams ngram_range=(2,2)  Count_data = CountVec.fit_transform(df)  Count_data = Count_data.astype(np.uint8)  cv_dataframe=pd.DataFrame(Count_data.toarray(), columns=CountVec.get_feature_names_out(), index=df.index)  # <- HERE  return cv_dataframe.astype(np.uint8)df_reps = pd.read_csv("c:\\file.csv")df = BOW(df_reps["Txt"])

The result will be the count of words in the "Txt" column.

RepID K9G3P9  4H477 -Q207KL41 98464 ... Q207KL411     2       8     3         2     ... 12     0       1     2         4     ... 2

The trick and here where I need the help, is that some of these terms have a - ahead of it, and that should count as negative value

So if the a text have these values Q207KL41 -Q207KL41 -Q207KL41

in that case the terms that starts with - should be count as negative and therefore, the BOW for the Q207KL41 is -1

instead of having a feature for Q207KL41 and -Q207KL41they both count towards the same term Q207KL41 but with positive and -negative

so the dataset after BOW will look like this

RepID K9G3P9  4H477 Q207KL41 98464 ... 1     2       8     -2         2     ...2     0       1     0         4     ...

How to do that?

Bag of Words with Negative Words in Python

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112