I’m working on a web scraping project using Python, BeautifulSoup, and Selenium in Google Colab. I’m trying to download multiple PDFs, but I’m encountering an error that I don’t understand:
AttributeError Traceback (most recent calllast) in <cell line: 19>()1718 # Setup webdriver---> 19 s=Service(ChromeDriverManager().install())20 driver = webdriver.Chrome('chromedriver', options=chrome_options)21 driver.get('https://www.offre-emploi.tn/consulter-cv/')
4 frames/usr/local/lib/python3.10/dist-packages/webdriver_manager/chrome.py ininstall(self)3839 def install(self) -> str:---> 40 driver_path = self._get_driver_binary_path(self.driver)41 os.chmod(driver_path, 0o755)42 return driver_path
/usr/local/lib/python3.10/dist-packages/webdriver_manager/core/manager.pyin _get_driver_binary_path(self, driver)3839 os_type = self.get_os_type()---> 40 file = self._download_manager.download_file(driver.get_driver_download_url(os_type))41 binary_path = self._cache_manager.save_file_to_cache(driver, file)42 return binary_path
/usr/local/lib/python3.10/dist-packages/webdriver_manager/drivers/chrome.pyin get_driver_download_url(self, os_type)3031 def get_driver_download_url(self, os_type):---> 32 driver_version_to_download = self.get_driver_version_to_download()33 # For Mac ARM CPUs after version 106.0.5249.61 the format of OS type changed34 # to more unified "mac_arm64". For newer versions, it'll be "mac_arm64"
/usr/local/lib/python3.10/dist-packages/webdriver_manager/core/driver.pyin get_driver_version_to_download(self)46 return self._driver_version_to_download47---> 48 return self.get_latest_release_version()4950 def get_latest_release_version(self):
/usr/local/lib/python3.10/dist-packages/webdriver_manager/drivers/chrome.pyin get_latest_release_version(self)62 return determined_browser_version63 # Remove the build version (the last segment) from determined_browser_version for version < 113---> 64 determined_browser_version = ".".join(determined_browser_version.split(".")[:3])65 latest_release_url = (66 self._latest_release_url
AttributeError: 'NoneType' object has no attribute 'split'
In google colab the undrelined line is this line:
s=Service(ChromeDriverManager().install())
but I think that the error indicates that the issue is in other place, isn't it?
Here’s my code:
!apt-get update!cp /usr/lib/chromium-browser/chromedriver /usr/bin!pip install selenium!pip install webdriver_manager!apt install -y chromium-chromedriverimport syssys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')from selenium import webdriverfrom selenium.webdriver.chrome.service import Servicefrom webdriver_manager.chrome import ChromeDriverManagerfrom selenium.webdriver.common.by import Byimport timeimport osimport requestsfrom bs4 import BeautifulSoupchrome_options = webdriver.ChromeOptions()chrome_options.add_argument('--headless')chrome_options.add_argument('--no-sandbox')chrome_options.add_argument('--disable-dev-shm-usage')s=Service(ChromeDriverManager().install())driver = webdriver.Chrome('chromedriver', options=chrome_options)driver.get('https://www.offre-emploi.tn/consulter-cv/')driver.find_element(By.NAME, 'jb_email').send_keys('my_email')driver.find_element(By.ID, 'jb_password').send_keys('my_password')driver.find_element(By.NAME, 'jb_submit_login').click()time.sleep(5)html = driver.page_sourcedriver.quit()soup = BeautifulSoup(html, 'html.parser')foldername = "resumes"os.makedirs(foldername, exist_ok=True)base = 'https://www.offre-emploi.tn{}'for pdf in soup.select("a[target='_blank']"): print("pdf") print(pdf) filename = pdf['href'][-5:] print(filename) pdf_link = base.format(pdf['href']) +".pdf" with open(f"{foldername}/{filename}.pdf", 'wb') as f: f.write(requests.get(pdf_link).content)