Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14215

Getting and processing the table

$
0
0

In addition, team names are included in a sub-label or there is more than one data (WWLDD) assigned to separate tags in the performance column, namely the last5 column, but located on a single column. This causes inconsistency in the number of columns and data information when processing the table. In addition, the team name and ppg data are also included in a sub-label. And when you hover over the team name with the mouse, a second table opens. (This part is definitely unwanted information. I only need the team names in that column.) Ultimately, my goal is to download this table as it appears on the site. I use pandas requests and bs4, but I am open to all offers such as selenium etc. How can I download this table to the specified page of the Excel workbook or to create a dynamic page and download it to that page?

Here are my codes that don't work:

import requestsimport pandas as pdfrom io import StringIOfrom bs4 import BeautifulSoup import lxmlimport xlsxwriterimport openpyxl import refrom selenium import webdriverfrom time import sleepfrom selenium.webdriver.common.keys import Keysurl = 'https://footystats.org/england/premier-league/form-table'header = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36","X-Requested-With": "XMLHttpRequest"}response = requests.get(url, headers=header)soup = BeautifulSoup (response.text, 'html.parser')# Find the tabletable = soup.find('table', {'class': 'full-league-table'})# Process the tabledata = []headers = []for row in table.find_all('tr'):    row_data = []    # th ve td etiketlerini içeren hücreleri çekme    for cell in row.find_all(['th', 'td']):        row_data.append(cell.text.strip())    if not headers:        headers = row_data    else:        data.append(row_data)# Convert data to pandas DataFramedf = pd.DataFrame(data, columns=headers)# Set first column as headerdf.columns = df.iloc[0]df = df[1:]# Writing to Excel workbookwith pd.ExcelWriter('PremierLeagueTable.xlsx', engine='openpyxl') as writer:    df.to_excel(writer, index=False)

While I was using vbasellenium, I decided to go to the next level. This was to increase the speed a little via pandas... but I never expected that things would be this difficult in python and that I could slow down this much. To get this table with VBA selenium

Set Table = driver.FindElementByXPath("............").AsTableTable.ToExcel .Range("A1")

2 lines of code is enough. Why are things so difficult in Python?

My goal is to get this table as an excel file.


Viewing all articles
Browse latest Browse all 14215

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>