I want to merget 2 CSV files into one CSV and remove all duplicated rows based on a column (second column).
Here is my first CSV file:
Skufnoo,748702985,-6026769894509215039,ВупÑеньпупÑеньâ¤ï¸â€ðŸ©¹ðŸ’—,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,4,True,False,0 mAtkmb,5213786988,4161254730445748607,ДаниÑльБлинов,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,False,False,False,0 sheluvjoseph,1421438213,8544915453690665435,អនសំអុល,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,5,True,False,0Second CSV file:
cchamnap,748702985,-7259273529368744780,Chim,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,5,True,False,0 chhounkha,765670208,3636141294788837002,Chhuon Sokha,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,False,False,False,0 CHHORMNIMOL8,5213786988,5104468652588260401,ឌីណា.,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,5,True,False,0 Chhailin17,1133044248,6931066845789435875,Chhai Lin,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,5,True,False,0Output file (own_updated2.csv) should be:
Skufnoo,748702985,-6026769894509215039,ВупÑеньпупÑеньâ¤ï¸â€ðŸ©¹ðŸ’—,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,4,True,False,0 mAtkmb,5213786988,4161254730445748607,ДаниÑльБлинов,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,False,False,False,0 sheluvjoseph,1421438213,8544915453690665435,អនសំអុល,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,5,True,False,0 chhounkha,765670208,3636141294788837002,Chhuon Sokha,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,False,False,False,0 Chhailin17,1133044248,6931066845789435875,Chhai Lin,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,5,True,False,0I have tried the following codes:
import pandas as pd import csv df1 = pd.read_csv("own1.csv") df2 = pd.read_csv("own2.csv") merged = pd.concat([df1,df2]) with open('own_updated.csv', 'w', newline="", encoding='utf-8') as nf: merged.to_csv(nf, index=False) with open('own_updated.csv', 'r', encoding="utf8") as in_file, open('own_updated2.csv', 'w', newline="", encoding="utf8") as out_file: in_data = csv.reader(in_file, delimiter=',') writer=csv.writer(out_file) tracks = set() # Tracking duplicates of the second column's cell for row in in_data: key = row[1] if key not in tracks: writer.writerow(row) tracks.add(key)It works well. But the problem is there is an extra file which is not needed own_updated.csv. How can I store all data from merging two CSV files without creating a file which is own_updated.csv, i.e. store them in a memory, then process the removal of duplicates based on the second column?