I have a large excel spreadsheet that I need to read data from certain rows, columns and cells and then output into a different dataframe format. How would I capture the data in specific cells while also ensuring the data can be captured when the spreadsheet is changed? Meaning more columns or rows could be added, but I need to continuously capture this data. Could you provide the code using python and pandas and using loops to dynamically capture this data. Again, not all cells will be used and only certain rows and columns will be used. Here is an example.
Logic
Display the count of the column name for a given quarter and ID. In this case: q1.22. I created new columns called: date and TYPE
Here is the excel spreadsheet:
Data
q1.22 ID type1 OFFICE nontype1 CustomerNY 1 3 1 2CA 1 33 1 0TOTALS 2 36 2 1data = {'0': ['id', 'NY', 'CA', 'TOTALS'],'q1.22': ['type1', '1', '1', '2'],'0_2': ['OFFICE', '3', '33', '36'],'0_3': ['nontype1', '1', '1', '2'],'0_4': ['Customer', '2', '0', '1']}Desired
ID date TYPENY q1.22 type1NY q1.22 nontype1NY q1.22 CustomerNY q1.22 CustomerCA q1.22 type1CA q1.22 nontype1Doing
# Define the row indices for both rangesstart_row, end_row = 0, 3 # Rows 1 to 4 (0-based index)# Define the column indices for the first range (A to C)start_col_range1, end_col_range1 = 0, 2 # Columns A to C (0-based index)# Define the column indices for the second range (E to F)start_col_range2, end_col_range2 = 4, 5 # Columns E to F (0-based index)# Create an empty list to store the captured datacaptured_data = []# Loop through rows and columns within the first range (A to C)for row in range(start_row, end_row + 1): row_label = df.iloc[row, 0] # Assuming the ID column is in the first column for col in range(start_col_range1, end_col_range1 + 1): col_label = df.columns[col] value = df.iloc[row, col] captured_data.append({'ID': row_label, 'date': df.iloc[0, 0], 'TYPE': col_label})# Loop through rows and columns within the second range (E to F)for row in range(start_row, end_row + 1): row_label = df.iloc[row, 0] # Assuming the ID column is in the first column for col in range(start_col_range2, end_col_range2 + 1): col_label = df.columns[col] value = df.iloc[row, col] captured_data.append({'ID': row_label, 'date': df.iloc[0, 0], 'TYPE': col_label})# Convert the captured data into a DataFrameoutput_df = pd.DataFrame(captured_data)However, this is the output:
ID date TYPE0 id id Unnamed: 01 id id q1.222 NY id Unnamed: 03 NY id q1.224 CA id Unnamed: 05 CA id q1.226 TOTALS id Unnamed: 07 TOTALS id q1.228 id id Unnamed: 39 id id Unnamed: 410 NY id Unnamed: 311 NY id Unnamed: 412 CA id Unnamed: 313 CA id Unnamed: 414 TOTALS id Unnamed: 315 TOTALS id Unnamed: 4Any suggestion is appreciated