Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Accessing a generator created Using AsyncIO

$
0
0

I am trying to follow this [tutorial][1], but use asyncio instead. I am stuck on the one part in the main function where I put the tasks in the list and then gather them in the results variable. I imagine that this is each of the pages. I want to use the following parse function to extract all the item information from each page

from typing import Anyimport asyncioimport httpxfrom selectolax.parser import HTMLParserfrom urllib.parse import urljoinasync def fetch_html(client: httpx.AsyncClient, url: str, **kwargs) -> Any:    headers = {"User-Agent": **"Add yours or you will get a 403"**    }    if kwargs.get("page"):        resp = await client.get(            url=url + str(kwargs.get("page")), follow_redirects=True, headers=headers        )    else:        resp = await client.get(url=url, follow_redirects=True, headers=headers)    try:        resp.raise_for_status()    except httpx.HTTPStatusError as exc:        print(            f"Error response {exc.response.status_code} while requesting {exc.request.url!r}. Page limit exceeded"        )        return False    html = HTMLParser(html=resp.text)    return htmldef extract_text(html: HTMLParser, sel: str):    try:        text = html.css_first(sel).text()        return text    except AttributeError:        return Nonedef parse_search_page(html: HTMLParser):    products = html.css("li.VcGDfKKy_dvNbxUqm29K")    for product in products:        yield urljoin("https://www.rei.com/", product.css_first("a").attributes["href"])async def main():    async with httpx.AsyncClient(http2=True) as client:        tasks = []        rei_url = "https://www.rei.com/c/camping-and-hiking/f/scd-deals?page="        for i in range(1, 2):            tasks.append(asyncio.create_task(fetch_html(client, rei_url, page=i)))        results = await asyncio.gather(*tasks) # print statement calls this [<HTMLParser chars=1437412>] so I thought this was an HTMLParser object that I can acesss in the following function    links = parse_search_page(results) # Argument of type "list[Unknown]" cannot be assigned to parameter "html" of type "HTMLParser" in function "parse_search_page"    for link in links:        print(link)if __name__ == "__main__":    asyncio.run(main())

I am getting an Attribute error. My ideal data would have the results and then I would be able to loop through it with the parse page function (which itself is a generator that I can loop through). That is the mental model I have currently. I have tried putting both in the create task, making the parse function async, but that didn't seem to make sense. I am not[1]: https://youtu.be/DHvzCVLv_FA?si=9qJeqLmI02iYSCkA


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>