I've previously built a BBC scraper which, among other things, scrape the headline from a given article such as this. However, BBC has recently changed their website, so I need to modify my scraper, which has proven to be difficult. For example, say I want to scrape the headline from the previously mentioned article. Inspecting the HTML using Firefox, I find the corresponding HTML attribute, which is data-component="headline-block"
(see the blue marked line in the image).
If I want to extract the corresponding tag, I'll do this:
import requestsfrom bs4 import BeautifulSoupurl = 'https://www.bbc.com/news/world-africa-68504329'# extract htmlhtml = requests.get(url).text# parse htmlsoup = BeautifulSoup(html, 'html.parser')# extract headline from souphead = soup.find(attrs = {'data-component': 'headline-block'})
But when I print the value of head
it returns None
, which means that Beautiful Soup can't find the tag. What am I missing? How do I solve this problem?