Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

How to efficiently test some HTTP proxies for accessing a specific domain?

$
0
0

I need an efficient way to test some free online HTTP proxies and determine which ones can access a specific website;

And since that proxy testing involves significant waiting time, I opted to redesign my code for asynchronous testing. I then explored the httpx and aiohttp packages. However, I encountered unexpected behaviors, leading me to question if my current code is the best fit for my purpose.

Below is the output of the code for three methods I used:

  • one using the requests package for synchronous testing,
  • and the other two for asynchronous testing.

As you can see, there are several errors, and the time taken to complete each request varies significantly.Interestingly, the requests method returned an HTTP 200 status for four links, while the httpx method returned five, and the aiohttp method returned nothing, which is unexpected considering they are supposed to perform the same task. This raises doubts about how I implemented them.

Additionally, in the httpx method, one proxy took an inexplicably long time, even though I set the timeout to 60 seconds. It took 13,480.64 seconds (I should mention that during this test, I put my PC into sleep mode when I noticed it was taking too long. When I returned later, I found that the process hadn't stopped and was still running.)

Can anyone please tell me what I'm doing wrong here and how I could improve it?

 1) --> 185.XXX.XX.XX:80     --> ProxyError      (4.96s) 2) --> 38.XX.XXX.XXX:443    --> HTTP (200)      (2.50s) 3) --> 162.XXX.XX.XXX:80    --> HTTP (200)      (20.92s) 4) --> 18.XXX.XXX.XXX:8080  --> HTTP (200)      (0.61s) 5) --> 31.XX.XX.XX:50687    --> ConnectionError (7.88s) 6) --> 177.XX.XXX.XXX:80    --> ProxyError      (21.07s) 7) --> 8.XXX.XXX.X:4153     --> HTTP (200)      (4.96s) 8) --> 146.XX.XXX.XXX:12334 --> ProxyError      (21.05s) 9) --> 67.XX.XXX.XXX:33081  --> ProxyError      (3.03s)10) --> 37.XXX.XX.XX:80      --> ReadTimeout     (60.16s)Testing 10 proxies with "requests" took 147.16 seconds. 4) --> 18.XXX.XXX.XXX:8080  --> HTTP (200)          (16.09s) 2) --> 38.XX.XXX.XXX:443    --> HTTP (200)          (22.11s) 7) --> 8.XXX.XXX.X:4153     --> HTTP (200)          (12.96s) 1) --> 185.XXX.XX.XX:80     --> RemoteProtocolError (24.83s) 9) --> 67.XX.XXX.XXX:33081  --> ConnectError        (6.02s) 3) --> 162.XXX.XX.XXX:80    --> HTTP (200)          (22.48s) 6) --> 177.XX.XXX.XXX:80    --> HTTP (200)          (26.96s) 5) --> 31.XX.XX.XX:50687    --> ConnectError        (34.50s) 8) --> 146.XX.XXX.XXX:12334 --> ConnectError        (27.01s)10) --> 37.XXX.XX.XX:80      --> ReadError           (13480.64s)Testing 10 proxies with "httpx" took 13507.80 seconds. 1) --> 185.XXX.XX.XX:80     --> ClientProxyConnectionError  (1.30s) 2) --> 38.XX.XXX.XXX:443    --> ClientProxyConnectionError  (0.67s) 3) --> 162.XXX.XX.XXX:80    --> ClientProxyConnectionError  (0.77s) 4) --> 18.XXX.XXX.XXX:8080  --> ClientProxyConnectionError  (0.83s) 5) --> 31.XX.XX.XX:50687    --> ClientProxyConnectionError  (0.85s) 6) --> 177.XX.XXX.XXX:80    --> ClientProxyConnectionError  (0.91s) 7) --> 8.XXX.XXX.X:4153     --> ClientProxyConnectionError  (0.94s) 8) --> 146.XX.XXX.XXX:12334 --> ClientProxyConnectionError  (1.03s) 9) --> 67.XX.XXX.XXX:33081  --> ClientProxyConnectionError  (1.05s)10) --> 37.XXX.XX.XX:80      --> ClientProxyConnectionError  (0.62s)Testing 10 proxies with "aiohttp" took 2.42 seconds.

Here's the code I used:

I started by downloading the proxies from this GitHub repository:

import randomimport tempfileimport osimport requestsimport timeimport asyncioimport httpximport aiohttpTIMEOUT: int = 60DEFAULT_DOMAIN: str = r"www.desired.domain.com"PROXIES_URL: str = "https://raw.githubusercontent.com/TheSpeedX/SOCKS-List/master/http.txt"PROXIES_PATH: str = os.path.join(tempfile.gettempdir(), "httpProxies.txt")HEADERS: dict = {"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7","accept-language": "en,ar;q=0.9,fr;q=0.8","Accept-Encoding": "gzip, deflate","dnt": "1","referer": "https://www.google.com/","sec-ch-ua": '"Microsoft Edge";v="123", "Not:A-Brand";v="8", "Chromium";v="123"',"sec-ch-ua-mobile": "?0","sec-ch-ua-platform": '"Windows"',"sec-fetch-dest": "document","sec-fetch-mode": "navigate","sec-fetch-site": "cross-site","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0","Connection": "keep-alive",}def get_proxies() -> list[str]:    proxies: list[str] = []    if os.path.exists(PROXIES_PATH):        with open(file=PROXIES_PATH, mode="r") as file:            proxies = file.read().splitlines()            file.close()    else:        response = requests.request(method="GET", url=PROXIES_URL)        if response.status_code == 200:            proxies = response.text            with open(file=PROXIES_PATH, mode="w") as file:                file.write(proxies)                file.close()            proxies = proxies.split("\n")    return proxies

below is the method I used to sequentially test those proxies:

def sequential_test(proxies_list: list[str]):    if proxies_list:        with requests.Session() as session:            session.headers = HEADERS            for i, proxy in enumerate(proxies_list, 1):                session.proxies = {"http": f"http://{proxy}"}                try:                    color = "\033[91m"                    start = time.perf_counter()                    response = session.get(url=f"http://{DEFAULT_DOMAIN}", timeout=TIMEOUT)                    status = f"HTTP ({response.status_code})"                    if response.status_code == 200:                        color = "\033[92m"                except Exception as exception:  # requests.RequestException                    status = type(exception).__name__                print(f"{i:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)")

The following is the code I used to test if a proxy is working with the desired website or not. I employed httpx and aiohttp, respectively:

async def is_alive_httpx(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:    proxy_mounts = {"http://": httpx.AsyncHTTPTransport(proxy=f"http://{proxy}"),}    async with httpx.AsyncClient(        mounts=proxy_mounts,        timeout=TIMEOUT,        headers=HEADERS,        follow_redirects=True    ) as session:        try:            color = "\033[91m"            start = time.perf_counter()            response = await session.send(httpx.Request(method="GET", url=f"http://{domain}"))            status = f"HTTP ({response.status_code})"            if response.status_code == 200:                color = "\033[92m"        except Exception as exception:  # httpx.HTTPError            status = type(exception).__name__        print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)"async def is_alive_aiohttp(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:    async with aiohttp.ClientSession(        timeout=aiohttp.ClientTimeout(total=TIMEOUT),        headers=HEADERS    ) as session:        try:            color = "\033[91m"            start = time.perf_counter()            response = await session.get(url=f"http://{domain}", proxy=f"http://{proxy}")            status = f"HTTP ({response.status})"            if response.status == 200:                color = "\033[92m"        except Exception as exception:  # aiohttp.ClientError            status = type(exception).__name__    print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:26}\t({time.perf_counter()-start:.2f}s)")    await asyncio.sleep(0.25)

Below is the remainder of the code. You can run it directly by copying it into your environment, (just ensure you have the required packages installed):

async def test_proxies(proxies_list: list[str], func):    if proxies_list:        await asyncio.gather(*[func(ip[0], ip[1]) for ip in enumerate(proxies_list, 1)])def main():    proxies = random.sample(get_proxies(), 10)  # get_proxies()[:10]    start = time.perf_counter()    sequential_test(proxies)    print(f'\nTesting {len(proxies)} proxies with "requests" took {time.perf_counter()-start:.2f} seconds.\n')    start = time.perf_counter()    asyncio.run(test_proxies(proxies, is_alive_httpx))    print(f'\nTesting {len(proxies)} proxies with "httpx" took {time.perf_counter()-start:.2f} seconds.\n')    start = time.perf_counter()    asyncio.run(test_proxies(proxies, is_alive_aiohttp))    print(f'\nTesting {len(proxies)} proxies with "aiohttp" took {time.perf_counter()-start:.2f} seconds.\n')if __name__ == "__main__":    main()

Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>