I'm a newbie in scrapy frame. I would like to scrape the news from https://www.snl24.com/dailysun/news. It will pop up news through infinite scroll. I encontered 2 issues during these days.
Since it needs infinite scroll to refresh new contents, so I find the url is
https://www.snl24.com/dailysun/news?pagenumber={page_number}` in the dev tool. Ended with the specific page number. And it can also be found from the html of the former page.![enter image description here]()
The contents in html from the dev tool looks fine and consist with what shows on the web, but the content fetched ended with
?pagenumber=4(for example) is totally confusing and different from what in dev tool.If I run the code below, it shows I crawl the page successfully but scraped no items. The output file contains only little part of the data. And it keeps crawling but output no data anymore.
![enter image description here]()
import scrapy, timefrom scrapy_lyntest.items import ScrapyLyntestItemclass DailysunSpider(scrapy.Spider): name = "DailySun" allowed_domains = ["www.snl24.com"] start_urls = ['https://www.snl24.com/dailysun/news'] def parse(self, response): newss = response.css('div.article-item--container') for news in newss: inner_news_relative_url = news.css('a[data-event-name="article_link"]::attr(href)').get() inner_news_url = 'https://www.snl24.com'+ inner_news_relative_url yield response.follow(inner_news_url, callback = self.parse_news_page) if response.css('div.loader-indicator').attrib['class'] == 'loader-indicator': scroll_relative_url = response.css('div[hx-trigger="revealed"]::attr(hx-get)').get() scroll_url = 'https://www.snl24.com/'+ scroll_relative_url yield scrapy.Request(url = scroll_url, callback=self.parse) def parse_news_page(self, response): newspage = response.css('div.article.tf-lhs-col') news_item = ScrapyLyntestItem() news_item["url"] = response.url, news_item["title"] = newspage.css('h1.article__title::text').get(), news_item["content"] = newspage.css('div.article__body.NewsArticle p::text').getall(), news_item["date"] = newspage.css('p.article__date::text').get() yield news_itemI tried to json.loads(response.text) what I get through fetching the url ended with pagenumber={}. But it pops up the error message JSONDecodeError: Expecting value: line 2 column 1 (char 1) so I tried to use css directly.
Need help. Thanks a lot.

