I've been trying to scrape the the email on this website:
https://hosocongty.vn/cong-ty-tnhh-tu-van-kinh-doanh-thuong-mai-hong-phuc-com-3266475.htm
The email is visible on the website and xpath but when i try to use scrapy to scrape it, it said email protected.
Here is my current code:
from scrapy import Spiderfrom scrapy.selector import Selectorfrom crawler.items import CrawlerItemclass CrawlerSpider(Spider): name = "crawler" allowed_domains = [""] start_urls = ["https://hosocongty.vn/cong-ty-tnhh-tu-van-kinh-doanh-thuong-mai-hong-phuc-com-3266475.htm", ] def parse(self, response): questions = Selector(response).xpath('//div[@class="box_content"]') for question in questions: item = CrawlerItem() item['info1'] = question.xpath('ul[1]/li[1]/h1/text()').extract() item['info2'] = question.xpath('ul[1]/li[2]/span/text()').extract() item['info3'] = question.xpath('ul[2]/li[2]/span/text()').extract() item['info4'] = question.xpath('ul[2]/li[3]/span//text()').extract() yield item
I'm not a developer and fairly new in python.
Thank you everyone.
I tried read many questions by other member on stackoverflow but no answer.