How can I take a selection from my html site and pass that to my scrapy spider so that the query is updated with new parameters and runs again?
I have tried a number of ways to pass this parameter from one script to another, but they all seem to fail. The latest I've used is a GET request via HTML. Even with that, I can't seem to get the spider to extract the new value and update the search.
I have been learning to use scrapyrt and flask. I have a simple, locally hosted flask webapp that I'm using to learn html while displaying some data for my project. The spider I have works and returns the results I want anytime the page is refreshed. These results are displayed nicely by flask on my html page.
I have created a simple dropdown with a submit button, the action is tied to the "update" endpoint, i.e.:
<form action="/update" method="post"><div class="dropdown">
I have a script called app.py with the following:
@app.route('/update', methods=['POST'])def update(): global selected selected = request.form['selection'] resp =\ requests.get(f'http://127.0.0.1:9080/crawl.jsonstart_requests=true&spider_name=test&selected={selected}').json() items = resp.get('items') return render_template('index.html', items=items)
Then my test.py spider looks something like this:
class TestSpider(scrapy.Spider): name = "test" allowed_domains = ["www.test.com"] def start_requests(self): global selected selected = getattr(self, 'selected', "sample") query = {"needs_updating": selected, } json_payload = json.dumps(query) yield scrapy.Request( url = URL, method = 'POST', body = json_payload, callback = self.update_query, ) def update_query(self, response) ...other code that seems to parse the result fine...
I've tried passing using POST and GET HTML requests. Tried passing as an argument in the start_requests function. Tried returning from the update function in app.py. In all cases, the spider does not seem to accept the updated value. Instead, it defaults to the sample value used on the first iteration.