Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 18906

How do I scrape a headline from a BBC article using Python and Beautiful Soup?

$
0
0

I've previously built a BBC scraper which, among other things, scrape the headline from a given article such as this. However, BBC has recently changed their website, so I need to modify my scraper, which has proven to be difficult. For example, say I want to scrape the headline from the previously mentioned article. Inspecting the HTML using Firefox, I find the corresponding HTML attribute, which is data-component="headline-block" (see the blue marked line in the image).

See the blue marked line.

If I want to extract the corresponding tag, I'll do this:

import requestsfrom bs4 import BeautifulSoupurl = 'https://www.bbc.com/news/world-africa-68504329'# extract htmlhtml = requests.get(url).text# parse htmlsoup = BeautifulSoup(html, 'html.parser')# extract headline from souphead = soup.find(attrs = {'data-component': 'headline-block'})

But when I print the value of head it returns None, which means that Beautiful Soup can't find the tag. What am I missing? How do I solve this problem?


Viewing all articles
Browse latest Browse all 18906