Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14040

Scraping Text through sections using scrapy

$
0
0

So i am currently using scrapy to scrape a website. The website has n number sublinks which i was able to enter. Each sublink has 3 things i need title, description and content. I am able to get title, description but the content is split across n number of section where number of section differ per sublink like in this exampleenter image description here

now i tried using loops to go through each section and store it but the yield functions gives me title,desc, and the content from the last section

below is code

def parse_instructions(self, response):    title = response.xpath('//\*\[@id="d-article"\]/div\[1\]/div\[1\]/h1/text()').get()    description = response.xpath('//\*\[@id="ency\_summary"\]/p/text()').getall()    joined_description = ''.join(description)    sections = response.css('section div.section:not([class*=" "])')    for section in sections:        section_text = ''.join(section.css('p::text').getall())        section_text = ''.join('a::text').getall()        section_text = ''.join('ul::text').getall()    yield {"title": title,"description": joined_description,"section_text": section_text,    }

Viewing all articles
Browse latest Browse all 14040

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>