Quantcast
Viewing all articles
Browse latest Browse all 14155

extracting attributes from html with lxml

I use lxml to retrieve the attributes of tags from an html page. The html page is formatted like this:

<div class="my_div"><a href="/foobar"><img src="my_img.png"></a></div>

The python script I use to retrieve the url inside the <a> tag and the src value of the <img> tag inside the same <div>, is this:

from lxml import html ...tree = html.fromstring(page.text)for element in tree.xpath('//div[contains(@class, "my_div")]//a'):    href = element.xpath('/@href')    src = element.xpath('//img/@src')

Why don't I get the strings?


Viewing all articles
Browse latest Browse all 14155

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>