How do I scrape a headline from a BBC article using Python and Beautiful Soup?

March 26, 2024, 3:29 pm

≫ Next: In Python, when are two objects the same?

≪ Previous: I am trying to build an AI image classifier, using a youtube guide. When I run my program (unfinished) it does not open up the image. (PYTHON)

I've previously built a BBC scraper which, among other things, scrape the headline from a given article such as this. However, BBC has recently changed their website, so I need to modify my scraper, which has proven to be difficult. For example, say I want to scrape the headline from the previously mentioned article. Inspecting the HTML using Firefox, I find the corresponding HTML attribute, which is data-component="headline-block" (see the blue marked line in the image).

If I want to extract the corresponding tag, I'll do this:

import requestsfrom bs4 import BeautifulSoupurl = 'https://www.bbc.com/news/world-africa-68504329'# extract htmlhtml = requests.get(url).text# parse htmlsoup = BeautifulSoup(html, 'html.parser')# extract headline from souphead = soup.find(attrs = {'data-component': 'headline-block'})

But when I print the value of head it returns None, which means that Beautiful Soup can't find the tag. What am I missing? How do I solve this problem?

↧