Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

read_csv seperator is creating extraneous null columns

$
0
0

I have a Pandas dataframe such as this:

#Date                     run               angle                NAME        #-----                  _______            ________             _______    2023-02-15 10:00:00      120716            -1.75493           4.5x10-4 Al 40um2023-02-15 10:38:48      120716            -1.75493           JD70-103 50um 0/90 deg2023-02-15 18:25:41      120723            -0.658             JD70-103 50um 45/135 deg

I am trying to read the .txt file using a regex separator that matches any number of whitespaces, unless preceded by a date ("\d\d:\d\d:\d\d"), "Al", "\d\dum", "deg", or something that looks like "[0-999]/[0-999]":

df = pd.read_csv("file.txt", engine='python', sep='\s+(?!\d\d:\d\d:\d\d|Al|\d\dum|deg|((\d|\d\d|\d\d\d)\/(\d|\d\d|\d\d\d)))')

For some reason, this is creating a dataframe that inserts three columns of NaN values in between each of my desired columns:

                  Date  NaN  None.1  None.2      run   None.3     None.4     None.5        angle ...0  2023-02-15 10:00:00  NaN     NaN     NaN   120716      NaN        NaN        NaN     -1.75493 ...1  2023-02-15 10:38:48  NaN     NaN     NaN   120716      NaN        NaN        NaN     -1.75493 ...2  2023-02-15 18:25:41  NaN     NaN     NaN   120723      NaN        NaN        NaN     -0.658 ...

Any idea why this is happening? My best guess is that the seperator is separating multiple whitespaces in a row, but that is not the behavior I would expect from the regex I mentioned above.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>