Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

how to read space-separated values (csv) file made using shell-like syntax into DataFrame

$
0
0

i'm looking for a fast way to read a csv file into dataframe. the csv file is a space separated, however, columns names are placed between double quotation if it contains more than one word & contains multiple spaces. pd.read_csv with sep=" " does not work, because colomn names spaces.

i currently solved this by applying shlex.split on every line of the file, changing it to comma separated. however, it is taking too long: ~6 seconds for a file with 15K lines. below is an example of my file ('template.csv') and code snipet on how it is solved through shlex.split.

appreciate the help in advance!

a b c "d e  " "f g  " "h k  "1 2 3 4 5 62 2 3 4 5 63 2 3 4 5 64 2 3 4 5 65 2 3 4 5 66 2 3 4 5 6

and below is code and desired dataframe output:

import pandas as pdimport shlexdata = []df = pd.DataFrame()for line in open(r'template.csv'):    line = shlex.split(line)    data.append(line)df = pd.DataFrame(data[1:], columns=[colName for colName in data[0]])   a  b  c d e f g h k0  1  2  3   4   5   61  2  2  3   4   5   62  3  2  3   4   5   63  4  2  3   4   5   64  5  2  3   4   5   65  6  2  3   4   5   6 

Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>