In addition, team names are included in a sub-label or there is more than one data (WWLDD) assigned to separate tags in the performance column, namely the last5 column, but located on a single column. This causes inconsistency in the number of columns and data information when processing the table. In addition, the team name and ppg data are also included in a sub-label. And when you hover over the team name with the mouse, a second table opens. (This part is definitely unwanted information. I only need the team names in that column.) Ultimately, my goal is to download this table as it appears on the site. I use pandas requests and bs4, but I am open to all offers such as selenium etc. How can I download this table to the specified page of the Excel workbook or to create a dynamic page and download it to that page?
Here are my codes that don't work:
import requestsimport pandas as pdfrom io import StringIOfrom bs4 import BeautifulSoup import lxmlimport xlsxwriterimport openpyxl import refrom selenium import webdriverfrom time import sleepfrom selenium.webdriver.common.keys import Keysurl = 'https://footystats.org/england/premier-league/form-table'header = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36","X-Requested-With": "XMLHttpRequest"}response = requests.get(url, headers=header)soup = BeautifulSoup (response.text, 'html.parser')# Find the tabletable = soup.find('table', {'class': 'full-league-table'})# Process the tabledata = []headers = []for row in table.find_all('tr'): row_data = [] # th ve td etiketlerini içeren hücreleri çekme for cell in row.find_all(['th', 'td']): row_data.append(cell.text.strip()) if not headers: headers = row_data else: data.append(row_data)# Convert data to pandas DataFramedf = pd.DataFrame(data, columns=headers)# Set first column as headerdf.columns = df.iloc[0]df = df[1:]# Writing to Excel workbookwith pd.ExcelWriter('PremierLeagueTable.xlsx', engine='openpyxl') as writer: df.to_excel(writer, index=False)
While I was using vbasellenium, I decided to go to the next level. This was to increase the speed a little via pandas... but I never expected that things would be this difficult in python and that I could slow down this much. To get this table with VBA selenium
Set Table = driver.FindElementByXPath("............").AsTableTable.ToExcel .Range("A1")
2 lines of code is enough. Why are things so difficult in Python?
My goal is to get this table as an excel file.