Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16388

scraping with selenium and undetected_chromedriver in docker container mac crashing

$
0
0

I was attempting to run undetected-chromedriver in a docker container on my m1 mac and received the following stack trace:

2024-02-11 13:20:04 2024-02-11 13:20:04 Scraping Pararius...2024-02-11 13:20:05 Traceback (most recent call last):2024-02-11 13:20:05   File "/app/main.py", line 42, in <module>2024-02-11 13:20:05     scrape_pararius(logger)2024-02-11 13:20:05   File "/app/scrape_pararius.py", line 25, in scrape_pararius2024-02-11 13:20:05     driver = uc.Chrome(options=set_undetected_chrome_options())2024-02-11 13:20:05   File "/usr/local/lib/python3.10/site-packages/undetected_chromedriver/__init__.py", line 466, in __init__2024-02-11 13:20:05     super(Chrome, self).__init__(2024-02-11 13:20:05   File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__2024-02-11 13:20:05     super().__init__(2024-02-11 13:20:05   File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/chromium/webdriver.py", line 53, in __init__2024-02-11 13:20:05     self.service.start()2024-02-11 13:20:05   File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/common/service.py", line 102, in start2024-02-11 13:20:05     self.assert_process_still_running()2024-02-11 13:20:05   File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/common/service.py", line 115, in assert_process_still_running2024-02-11 13:20:05     raise WebDriverException(f"Service {self._path} unexpectedly exited. Status code was: {return_code}")2024-02-11 13:20:05 selenium.common.exceptions.WebDriverException: Message: Service /root/.local/share/undetected_chromedriver/undetected_chromedriver unexpectedly exited. Status code was: 255

I am trying to scrape the website using those options:

def set_undetected_chrome_options() -> Options:    chrome_options = uc.ChromeOptions()    chrome_options.add_argument("--headless")    chrome_options.add_argument("--no-sandbox")    chrome_options.add_argument("--disable-dev-shm-usage")    chrome_options.add_argument("--disable-blink-features=AutomationControlled")    return chrome_optionsdef scrape_pararius(logger):    driver = uc.Chrome(options=set_undetected_chrome_options())    driver.get(SEARCH_URL)    sleep(7)    page_source = driver.page_source    soup = BeautifulSoup(page_source, "html.parser")    properties = soup.find_all("li", class_="search-list__item search-list__item--listing")

This is how I build my Docker image:

# Use Python 3.10 Alpine imageFROM python:3.10-alpine# Update apk repositoriesRUN echo "http://dl-4.alpinelinux.org/alpine/v3.14/main" >> /etc/apk/repositories && \    echo "http://dl-4.alpinelinux.org/alpine/v3.14/community" >> /etc/apk/repositories# Install Chrome and ChromeDriver# install chromedriverRUN apk updateRUN apk add chromium chromium-chromedriver# Upgrade pipRUN pip install --upgrade pip# Install SeleniumRUN pip install selenium undetected-chromedriver# Set the working directory in the container to /appWORKDIR /app# Copy the current directory contents into the container at /appCOPY . /app# Install any needed packages specified in requirements.txtRUN pip install --no-cache-dir -r requirements.txt# Run main.py when the container launchesCMD ["python3", "-u", "main.py" ]

How to prevent it from crashing in the docker container? I can run it with a normalwebdriver.Chrome(), but I want to use the undetectable one.


Viewing all articles
Browse latest Browse all 16388

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>