I use lxml
to retrieve the attributes of tags from an html page. The html page is formatted like this:
<div class="my_div"><a href="/foobar"><img src="my_img.png"></a></div>
The python script I use to retrieve the url inside the <a>
tag and the src
value of the <img>
tag inside the same <div>
, is this:
from lxml import html ...tree = html.fromstring(page.text)for element in tree.xpath('//div[contains(@class, "my_div")]//a'): href = element.xpath('/@href') src = element.xpath('//img/@src')
Why don't I get the strings?