Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13951

Read Fastq file directly into Pandas Dataframe

$
0
0

I'm trying to read a Fastq file directly into a pandas dataframe, similar to the link below:

Read FASTQ file into a Spark dataframe

I've searched all over, but just can't find a viable option.

Currently, I'm running the following:

cmd = f'zcat {infile} | paste - - - -'p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)b = StringIO(p.communicate()[0].decode('utf-8'))_ = pd.read_csv(b, sep='\t', names=['read_id', 'seq', '+', 'qual'], on_bad_lines='skip', dtype=str, chunksize=1000000)

Is there a cleaner way to just use pandas instead? I was thinking of setting sep='\n', but then I just get 1 row with multiple columns. Could I maybe read the file in, and then take every 4th row to create the 4 needed columns (or something like that)?

Speed is really what I'm looking for, so the quickest solution would be the best.

Side note: my Fastq files will not fit in memory, so I will have to chunk the read


Viewing all articles
Browse latest Browse all 13951

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>