I am trying to read a bunch of tsv dataset files with (normally) three columns
Pandas df of a file looks like this
But some of the files have extra two values here and there, also separated by tabs. So i made a code to solve this problem, by reading files as having five columns and then removing two of them.
for i in range(101, 154): print(i) # read a file into pandas df thisfile = pd.read_csv(f'pgc-csv/2022_06_22_TRPV1_AAV488_6x10-11_No1/2022-06-22_TRPV1_AAV488_6x10-11_No{i}.txt', skiprows=10, header = None, names = ['Время, s', 'Laser, V', 'ECG lead', 'empty1', 'empty2'], encoding = 'unicode_escape', delimiter = '\t', ) #delete extra columns del thisfile['empty1'] del thisfile['empty2']
But for that problem files I get an error
"DtypeWarning: Columns (3) have mixed types. Specify dtype option on import or set low_memory=False.'
I tried to usу a method from this article:https://www.roelpeters.be/solved-dtypewarning-columns-have-mixed-types-specify-dtype-option-on-import-or-set-low-memory-in-pandas/
for i in range(101, 154): print(i) # read a file into pandas df thisfile = pd.read_csv(f'pgc-csv/2022_06_22_TRPV1_AAV488_6x10-11_No1/2022-06-22_TRPV1_AAV488_6x10-11_No{i}.txt', skiprows=10, header = None, names = ['Время, s', 'Laser, V', 'ECG lead', 'empty1', 'empty2'], encoding = 'unicode_escape', delimiter = '\t', dtype={'Время, s': float, 'Laser, V':float, 'ECG lead': float, 'empty1': 'str', 'empty2': 'str'}) #delete extra columns del thisfile['empty1'] del thisfile['empty2']
But i still get the errors:Screenshot
The first question is: how can remove this error?
The second question is that, as i understand, there are some values with datatypes other then float in the df.
I tried to get them with this:
ecgfile[lambda x: not isinstance(x['Время, s'], float)]
And this:
ecgfile[lambda x: type(x['Время, s']) is not float]
But didn't succeed. So i need an advice on this part, too.
The last question is, maybe, there is some overall better way to do all this procedures?Thank you)