Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13921

Inserting rows into csv file if and only if data in specific row columns does not already exist in csv

$
0
0

I have two csv files that have the same structure. tops.csv and yesterday.csv. I am first sorting yesterday by a specific column than I want to take the row with the highest value in that column. For this row, I want to check 3 of its columns, (the 3rd fourth and fifth columns) and see if there is a row in tops.csv that has the same data in the same columns. So i am checking for duplicates in those 3 columns. If a row with the data already exists, the row we found should be skipped and proceed to the next highest/valid row by the sort column until one is found which does not already have data in tops.csv. If there is no row already, I want to write the row to a csv called for_email.csv and also append that row to tops.csv where we checked for its existence.

This is the script I have so far with comments for each step

import csvimport pandas as pd# Replace 'your_file.csv' with the actual path to your CSV filefile_path = 'yesterday.csv'# Read the CSV file into a DataFramedf = pd.read_csv(file_path, header=None)# Sort the DataFrame by last column and then by the 2nd column from the leftdf_sorted = df.sort_values(by=[df.columns[-1], df.columns[1]], ascending=[False, True])# Drop duplicates based on all columns except the first one, keeping the first occurrencedf_sorted_no_duplicates = df_sorted.drop_duplicates(subset=df_sorted.columns[2:])# Read the 'tops.csv' file to check for existing datatops_df = pd.read_csv('tops.csv', header=None)# Initialize a counter for the number of rows written to 'for_email.csv'rows_written = 0# Open the 'for_email.csv' file for writingwith open('for_email.csv', 'w', newline='') as for_email_file:    for index, row in df_sorted_no_duplicates.iterrows():        # Check if the third, fourth, and fifth columns of the current row exist in 'tops.csv'        if not tops_df[tops_df.columns[2:]].eq(row.iloc[2:]).all(axis=1).any():            # If not, write the row to 'for_email.csv'            for_email_file.write(','.join(map(str, row)) +'\n')            rows_written += 1            # Break the loop when 3 rows have been written            if rows_written == 1:                breakprint(f"{rows_written} rows have been written to 'for_email.csv'")# Open the 'for_email.csv' file for readingwith open('for_email.csv', 'r') as for_email_file:    # Read the content of 'for_email.csv'    for_email_content = for_email_file.read()# Append the content of 'for_email.csv' to 'tops.csv'with open('tops.csv', 'a', newline='') as tops_file:    tops_file.write(for_email_content)print(f"{rows_written} rows have been appended to 'tops.csv'")

Here is a sample of my tops.csv

Ale Mary's,3.45,Barrel Aged Wham Whams - Coffee,Prison City Pub & Brewery,174906,Stout - Imperial / Double,11.0,4.7619,21,1,4.63971Coopers Seafood House,2.99,Kentucky Breakfast Stout (KBS) (2015),Founders Brewing Co.,549,Stout - Imperial / Double Coffee,11.2,4.52617,58194,0,4.52608Ale Mary's,3.45,Wendigo - Double Oaked (Batch 2 - 2021),Anchorage Brewing Company,13756,Barleywine - Other,15.5,4.48663,2569,0,4.4862Coopers Seafood House,2.99,Kentucky Breakfast Stout (KBS) (2015),Founders Brewing Co.,549,Stout - Imperial / Double Coffee,11.2,4.52617,58194,0,4.52604

Here is a sample of my yesterday.csv

Ale Mary's,3.45,Barrel Aged Wham Whams - Coffee,Prison City Pub & Brewery,174906,Stout - Imperial / Double,11.0,4.7619,21,1,4.63971Coopers Seafood House,2.99,Kentucky Breakfast Stout (KBS) (2015),Founders Brewing Co.,549,Stout - Imperial / Double Coffee,11.2,4.52617,58194,0,4.52604Ale Mary's,3.45,Wendigo - Double Oaked (Batch 2 - 2021),Anchorage Brewing Company,13756,Barleywine - Other,15.5,4.48663,2569,0,4.48604Ale Mary's,3.45,Bourbon County Brand Stout (2018) 14.7%,Goose Island Beer Co.,2898,Stout - Imperial / Double,14.7,4.46902,49672,0,4.46887Coopers Seafood House,2.99,Kentucky Breakfast Stout (KBS) (2017),Founders Brewing Co.,549,Stout - Imperial / Double Coffee,11.8,4.44884,88500,0,4.44872Bartari,3.22,Utopias Barrel-Aged 120 Minute IPA,Dogfish Head Craft Brewery,459,IPA - Imperial / Double,17.0,4.44246,7565,1,4.44017

When I run it with this data, I am getting back this row:

Coopers Seafood House,2.99,Kentucky Breakfast Stout (KBS) (2015),Founders Brewing Co.,549,Stout - Imperial / Double Coffee,11.2,4.52617,58194,0,4.52604

which already exists, yet I should be getting back this one:

Ale Mary's,3.45,Bourbon County Brand Stout (2018) 14.7%,Goose Island Beer Co.,2898,Stout - Imperial / Double,14.7,4.46902,49672,0,4.46887

I am not sure where I am going wrong. Any assistance would be appreciated


Viewing all articles
Browse latest Browse all 13921

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>