Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 19733

Fix csv file rows with more columns than others, in Python

$
0
0

I have to upload an Excel file into Teradata. So I took the tab I needed to upload and saved it as a csv file. (I was advised to use Teradata BTEQ after multiple failed attempts at using FastLoad through the Teradata Studio GUI.)

Issues:

  1. Some rows have more columns than others.
  2. When using BTEQ, some characters were misinterpreted.
  3. I can print out some of the counts, but I eventually get the error
    UnicodeDecodeError: 'charmap' codec can't decode byte ... in position ...: character maps to <undefined> and I'm not sure what to do about that.

I was advised to use Python to count the commas/delimiters in each row to find the ones with too many columns and fix them, but there are 125,000 rows and 66 columns in each row. (It is very dirty data that was manually entered without much use of Excel's data validation options.)

It would best if I could print out only the row numbers (not line numbers) of the rows that need to be fixed and fix them on the spot with a conditional statement.

The code I have now will print the file path, then the number of columns in each row on a new line, then stop processing and return a UnicodeDecodeError.

Code:

with open('Data.csv', 'r') as csv_file:      for line in csv_file:          print( line.count(','))

Viewing all articles
Browse latest Browse all 19733

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>