Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13861

Queue structure for a webscraper API

$
0
0

I built a simple webscraper that runs in a docker container. It basically puts company names into a search and reads out the result. Unfortunately there is a limit to how many searches I can do per hour. Here's what I want to do:

There is a very long predefined list of companies that the scraper goes through. However, on receiving a request, the scraper is to stop that, and go through the requested list first, before then resuming on the long initial list. There is an added complexity, because the scraper runs multiple threads at a time to be faster.

How would I go about implementing something like this?

Would you make two containers and stop one to then activate the other and when that's done activate the first again? Or can I have it all in one container and somehow pause the process?

One idea I had was using a messagebroker and check for a request after every scrape. If there is one, jump in the loop of those companies and resume after it's done.

Do you have any hints for me how this can be done efficiently and reliably?


Viewing all articles
Browse latest Browse all 13861

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>