Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23189

dynamically mapping an excel spreadsheet using python and pandas [duplicate]

$
0
0

I have a large excel spreadsheet that I need to read data from certain rows, columns and cells and then output into a different dataframe format. How would I capture the data in specific cells while also ensuring the data can be captured when the spreadsheet is changed? Meaning more columns or rows could be added, but I need to continuously capture this data. Could you provide the code using python and pandas and using loops to dynamically capture this data. Again, not all cells will be used and only certain rows and columns will be used. Here is an example.

Logic

Display the count of the column name for a given quarter and ID. In this case: q1.22. I created new columns called: date and TYPE

Here is the excel spreadsheet:

Data

        q1.22           ID      type1   OFFICE  nontype1    CustomerNY      1       3       1           2CA      1       33      1           0TOTALS  2       36      2           1data = {'0': ['id', 'NY', 'CA', 'TOTALS'],'q1.22': ['type1', '1', '1', '2'],'0_2': ['OFFICE', '3', '33', '36'],'0_3': ['nontype1', '1', '1', '2'],'0_4': ['Customer', '2', '0', '1']}

Desired

ID  date    TYPENY  q1.22   type1NY  q1.22   nontype1NY  q1.22   CustomerNY  q1.22   CustomerCA  q1.22   type1CA  q1.22   nontype1

Doing

# Define the row indices for both rangesstart_row, end_row = 0, 3  # Rows 1 to 4 (0-based index)# Define the column indices for the first range (A to C)start_col_range1, end_col_range1 = 0, 2  # Columns A to C (0-based index)# Define the column indices for the second range (E to F)start_col_range2, end_col_range2 = 4, 5  # Columns E to F (0-based index)# Create an empty list to store the captured datacaptured_data = []# Loop through rows and columns within the first range (A to C)for row in range(start_row, end_row + 1):    row_label = df.iloc[row, 0]  # Assuming the ID column is in the first column    for col in range(start_col_range1, end_col_range1 + 1):        col_label = df.columns[col]        value = df.iloc[row, col]        captured_data.append({'ID': row_label, 'date': df.iloc[0, 0], 'TYPE': col_label})# Loop through rows and columns within the second range (E to F)for row in range(start_row, end_row + 1):    row_label = df.iloc[row, 0]  # Assuming the ID column is in the first column    for col in range(start_col_range2, end_col_range2 + 1):        col_label = df.columns[col]        value = df.iloc[row, col]        captured_data.append({'ID': row_label, 'date': df.iloc[0, 0], 'TYPE': col_label})# Convert the captured data into a DataFrameoutput_df = pd.DataFrame(captured_data)

However, this is the output:

ID  date    TYPE0   id  id  Unnamed: 01   id  id  q1.222   NY  id  Unnamed: 03   NY  id  q1.224   CA  id  Unnamed: 05   CA  id  q1.226   TOTALS  id  Unnamed: 07   TOTALS  id  q1.228   id  id  Unnamed: 39   id  id  Unnamed: 410  NY  id  Unnamed: 311  NY  id  Unnamed: 412  CA  id  Unnamed: 313  CA  id  Unnamed: 414  TOTALS  id  Unnamed: 315  TOTALS  id  Unnamed: 4

Any suggestion is appreciated


Viewing all articles
Browse latest Browse all 23189

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>