Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16566

SSL Certificate Verification Error When Scraping Website and Inserting Data into MongoDB

$
0
0

Problem Description:

I'm attempting to scrape the website at https://www.cbit.ac.in/current_students/acedamic-calendar/ using the requests library along with BeautifulSoup. However, upon making a request to the website, I encounter the following SSL certificate verification error:

requests.exceptions.SSLError:  HTTPSConnectionPool(host='www.cbit.ac.in', port=443):    Max retries exceeded with url:      /current_students/acedamic-calendar/      (Caused by SSLError(SSLCertVerificationError(1,'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))

Approach:

To address the SSL verification issue, I've attempted to specify the path to the CA certificate using the verify parameter in the requests.get() function call. The CA certificate path is /Users/rishilboddula/Downloads/cbit.ac.in.cer. Despite this, the SSL verification error persists.

After successfully scraping the website, I intend to store the extracted URLs in a MongoDB collection named ull using the pymongo library. However, due to the SSL verification error, I'm unable to proceed with the scraping and data insertion process.

Request for Assistance:

I'm seeking guidance on resolving the SSL certificate verification error to successfully scrape the website and insert the data into MongoDB. Additionally, if there are any best practices or alternative approaches for handling SSL certificate verification in Python, I would greatly appreciate any insights.

# Import necessary librariesimport requestsfrom bs4 import BeautifulSoupimport pymongo# Specify the path to the CA certificateca_cert_path = '/Users/rishilboddula/Downloads/cbit.ac.in.cer'# Make a request to the website with SSL verificationreq = requests.get('https://www.cbit.ac.in/current_students/acedamic-calendar/', verify=ca_cert_path)# Parse the HTML contentsoup = BeautifulSoup(req.content, 'html.parser')# Extract all URLs from the webpagelinks = soup.find_all('a')urls = [link.get('href') for link in links]# Connect to MongoDBclient = pymongo.MongoClient('mongodb://localhost:27017')db = client["data"]ull = db["ull"]# Insert each URL into the MongoDB collectionfor url in urls:    ull.insert_one({"url": url})

Viewing all articles
Browse latest Browse all 16566

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>