I am trying to scrape and repurpose the news images and titles from a newsfeed page so I can reuse them in a signage display (Xibo). Basically I just want the first three rows of the main content of this URL without any header/footer info, and no extra code/scripting, etc. Just the medium-sized picture and title under it. Would like to scrape the images/titles, then render a simple html page with Flask once a day for the CMS to read.https://news.clemson.edu/tag/extension/
I gathered that I need selenium to obtain the rendered page in this case?In the code below, I am having difficulty finding the image URLs properly.This will read in the page and scroll, but finds no images. I tried some of the nested divs, but no luck either. Can someone point me in the right direction to obtain the image URLs (and ultimately the titles)?
#News feed test for Xibo Signage#from flask import Flask, render_templatefrom markupsafe import Markup#app=Flask(__name__) from urllib.request import Request, urlopenfrom bs4 import BeautifulSoupimport requestsimport timefrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.chrome.service import Servicefrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as EC#installed chrome driver in scripts so don't need next lines? #chromedriver_path = '...'driver = webdriver.Chrome()url = "https://news.clemson.edu/tag/extension/"driver.get(url)# wait (up to 20 seconds) until the images are visible on pageimages = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "site-main")))# scroll to the last image, so that all images get rendered correctlydriver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', images[-1])time.sleep(2)# PRINT URLS USING SELENIUM -for test (will pass to Flask)print('Selenium')for img in images: print(img.get_attribute('src'))#@app.route('/') #def home():# return render_template('home.html',thumbnailmk=thumbnailmk)#if __name__ == '__main__':# app.run(host='0.0.0.0')# app.run(debug=True)