Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Efficiently Retrieving Latest Matched Strings from Large CSV Based on Patterns in Python

$
0
0

I'm dealing with two CSV files for a Python task. The first CSV has 'string' and 'updated' columns, while the second CSV has a 'pattern' column. My goal is to efficiently find the latest matching string for each pattern from the first CSV. However, the first CSV is large with around 8 million rows, whereas the second has 50,000 rows.

Given this situation, what would be the most efficient approach in Python to solve this task?

Initially, I tried using pandas, but processing the large first CSV was time-consuming. Then, I attempted Dask, which improved performance, but I faced a challenge: Dask operates with chunks, making it difficult to get the latest matching string for each pattern.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>