Problem Description:
I'm attempting to scrape the website at https://www.cbit.ac.in/current_students/acedamic-calendar/ using the requests
library along with BeautifulSoup
. However, upon making a request to the website, I encounter the following SSL certificate verification error:
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.cbit.ac.in', port=443): Max retries exceeded with url: /current_students/acedamic-calendar/ (Caused by SSLError(SSLCertVerificationError(1,'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
Approach:
To address the SSL verification issue, I've attempted to specify the path to the CA certificate using the verify parameter in the requests.get()
function call. The CA certificate path is /Users/rishilboddula/Downloads/cbit.ac.in.cer
. Despite this, the SSL verification error persists.
After successfully scraping the website, I intend to store the extracted URLs in a MongoDB collection named ull
using the pymongo
library. However, due to the SSL verification error, I'm unable to proceed with the scraping and data insertion process.
Request for Assistance:
I'm seeking guidance on resolving the SSL certificate verification error to successfully scrape the website and insert the data into MongoDB. Additionally, if there are any best practices or alternative approaches for handling SSL certificate verification in Python, I would greatly appreciate any insights.
# Import necessary librariesimport requestsfrom bs4 import BeautifulSoupimport pymongo# Specify the path to the CA certificateca_cert_path = '/Users/rishilboddula/Downloads/cbit.ac.in.cer'# Make a request to the website with SSL verificationreq = requests.get('https://www.cbit.ac.in/current_students/acedamic-calendar/', verify=ca_cert_path)# Parse the HTML contentsoup = BeautifulSoup(req.content, 'html.parser')# Extract all URLs from the webpagelinks = soup.find_all('a')urls = [link.get('href') for link in links]# Connect to MongoDBclient = pymongo.MongoClient('mongodb://localhost:27017')db = client["data"]ull = db["ull"]# Insert each URL into the MongoDB collectionfor url in urls: ull.insert_one({"url": url})