Quantcast
Viewing all articles
Browse latest Browse all 14040

Parsing CSV in Python: Issues with Extracting Full Text from a Column

I'm working on creating a function get_data() in Python that doesn't take any arguments. Its purpose is to load a file named tx_deathrow_full.csv and return a list of dictionaries. Each dictionary should have 18 fields corresponding to a row in the dataset.

An updated version of the dataset can be found in:

https://www.tdcj.texas.gov/death_row/dr_executed_offenders.html

My Attempt:

I've written a piece of code that nearly accomplishes this:

def get_data():    deathrow_data = []    with open('tx_deathrow_full.csv', 'r') as fil:        # Skips the first two rows (because of how the csv-file is given)        next(fil)        next(fil)        reader = csv.reader(fil)        for count, row in enumerate(reader):            if count < 5:  # Sample of 5 rows                row_data = row[0].split(',')                row_dict = {'Execution': int(row_data[0]),'Date of Birth': row_data[1],'Date of Offence': row_data[2],'Highest Education Level': int(row_data[3]),'Last Name': row_data[4],'First Name': row_data[5],'TDCJ Number': int(row_data[6]),'Age at Execution': int(row_data[7]),'Date Received': row_data[8],'Execution Date': row_data[9],'Race': row_data[10],'County': row_data[11],'Eye Color': row_data[12],'Weight': int(row_data[13]),'Height': row_data[14],'Native County': row_data[15],'Native State': row_data[16],'Last Statement': row_data[17].rstrip(';') # To remove ';' in the end of the Last Statement.                }                deathrow_data.append(row_dict)            else:                break    return deathrow_data

However, the issue arises with the 'Last Statement' field. It only captures text up to the first comma and cuts off the rest, or it fails to include all the text.

Example:

In the CSV file, row 6 reads:

550,1987-04-04,2008-04-06,11,Davila,Erick Daniel,999545,31,2009-02-27,2018-04-25,Black,Tarrant,Brown,161,"5' 11""",Tarrant,Texas,"Yes, I would like to say nephew it burns huh. You know I might have lost the fight but I'm still a soldier. I still love you all. To my supporters and family y'all hold it down. Ten Toes down right. That's all."

But my code returns:

550,1987-04-04,2008-04-06,11,Davila,Erick Daniel,999545,31,2009-02-27,2018-04-25,Black,Tarrant,Brown,161,"5' 11""",Tarrant,Texas,"Yes, I would like to say nephew it burns huh. You know I might have lost the fight but I'm still a soldier. I still love you all. To my supporters and family y'all hold it down. Ten Toes down right. That's all."

But when i run my code it only returns the following output:

[... , {'Execution': 550, 'Date of Birth': '1987-04-04', 'Date of Offence': '2008-04-06', 'Highest Education Level': 11, 'Last Name': 'Davila', 'First Name': 'Erick Daniel', 'TDCJ Number': 999545, 'Age at Execution': 31, 'Date Received': '2009-02-27', 'Execution Date': '2018-04-25', 'Race': 'Black', 'County': 'Tarrant', 'Eye Color': 'Brown', 'Weight': 161, 'Height': '"5\' 11"""', 'Native County': 'Tarrant', 'Native State': 'Texas', 'Last Statement': '"Yes'}, ...]

Note how 'Last Statement' is incorrectly truncated to '"Yes', instead of the full text.

This is the CSV-file before downloading.

This is the CSV-file when i have downloaded in and opened it in Excel.

Note:Note that both snips has the example shown

Question:

How can I modify my code to ensure the full 'Last Statement' text is captured and included in the dictionary, considering the structure of the CSV file?

What i have tried:

I've attempted to use the Pandas library to address this issue, but without success as I am not familiar with its functionalities.

Additionally, I experimented with the following code snippet earlier, but later modified it to the version I previously shared, as it seemed closer to producing the desired result.

def get_data():"""    Returnerer deathrow_data opdateret!"""    deathrow_data = []  # Tom liste til rækkerne    with open('tx_deathrow_full.csv', 'r') as fil:            fieldnames1 = ['Execution', 'Date of Birth', 'Date of Offence',             'Highest Education Level', 'Last Name','First Name', 'TDCJ\nNumber', 'Age at Execution', 'Date Received', 'Execution Date', 'Race', 'County', 'Eye Color', 'Weight', 'Height', 'Native County', 'Native State', 'Last Statement']        reader = csv.DictReader(fil, fieldnames=fieldnames1, delimiter=';')           next(reader)  # Springer første linje over (header)        next(reader)  # Springer anden linje over        count = 0        for row in reader:            if count < 4:  # Laver en sample på 3 rækker                deathrow_data.append(row)                count += 1            else:                break    return deathrow_dataresult = get_data()print(result)

Don't mind the danish comments in the code.


Viewing all articles
Browse latest Browse all 14040

Trending Articles