I have been struggling with web scraping by using Selenium. The target website contains a responsive table in which I will have to gather the data. the html codes look like this (please forgive me for altering the html codes that contains certain information, where I would say the structure stays the same as the real one):
<table class="items">> ...> <tbody>>> <tr class="odd">>>> <td class="centered">1</td>>>> <td class="centered no-border-right">>>>> <a title="company 1" name="" href="/company 1/year_id/1970"><img src="https://company_1.com/logo.png">>>>> </a>>>> </td>>>> <td class="mainlink no-border-links">>>>> <a title="company 1" name="" href="/company 1/year_id/1970">company 1>>>> </a>>>> </td>>>> <td class="rights mainlink redtext">$270k</td>>>> <td class="centered">>>>> <a href="/company 1/purchase/year_id/1970">5>>>> </a>>>> </td>>>> <td class="rights mainlink greentext">->>> </td>>>> <td class="centered">>>>> <a href="/company 1/purchase/year_id/1970">4>>>> </a>>>> </td>>>> <td class="rights mainlink"><span class="redtext">$-270k</span>>>> </td>>> </tr>>> <tr class="even"># these code blocks repeat with different data for 24 times...> </tbody></table>...
And with the help of Gemini, in Python my syntax are as below:
from selenium import webdriverfrom selenium.webdriver.chrome.options import Optionsfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.common.by import Byfrom fake_useragent import UserAgentfrom pandas import DataFrameoption = webdriver.ChromeOptions()option.add_argument("--headless")ua = UserAgent()option.add_argument(f"user-agent={ua.chrome}")driver = webdriver.Chrome(options=option)table_class='items'url_expenditure = 'https://target_website.com'driver.get(url_expenditure)driver.implicitly_wait(5)table_element = driver.find_element(By.CLASS_NAME, table_class)table_data = table_element.find_element(By.TAG_NAME, "tr") table_data = []for row in table_element.find_elements(By.TAG_NAME, "tr"):> row_data = [cell.text.strip() for cell in row.find_elements(By.TAG_NAME, "td")] > table_data.append(row_data)driver.quit()print(table_data)
The results somehow show a list of correct rows and columns. However, the data are not shown, with commas separated: [[], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '']]
whereas I expect to see [['1','','company 1','$270K','5','-','4','$-270K'],[#next row of data]...]
Please help explain what I have to amend in the code block, thank you!