I am trying to build a Python Script, that given a list of table names (retrieved from information_schema). Loops through an entire database, collecting the MAX(DATE) column per table, and throws it all together into a master table. The tables that required looping through are often in different schemas.
EXAMPLE:
SCHEMA_1.TABLE_1
| data | date | other_cols |
|---|---|---|
| yes | 26/02/2024 | sample |
SCHEMA_2.TABLE_2
| data | date | other_cols |
|---|---|---|
| no | 25/02/2024 | sample |
MASTER TABLE BUILT WITH SCRIPT:
| table_name | data | date |
|---|---|---|
| TABLE_1 | yes | 26/02/2024 |
| TABLE_2 | no | 25/02/2024 |
EDIT: What I have so far
database= "DATABASE"schema= 'SCHEMA'#to get list of tablescs.execute( """ select TABLE_NAME from DATABASE.information_schema.columns where table_schema = '{schema}' AND COLUMN_NAME IN ('FILE_DATE', 'MODIFIED_FILE_DATE') AND (TABLE_NAME NOT LIKE '%TEST%' AND TABLE_NAME NOT LIKE '%TEMP%');""") list = [x[0] for x in cs.fetchall()]data = []df1 = pd.DataFrame(data)#loop through each table and save result to dffor table in list: sql = f"SELECT '{table}' as TABLE_NAME, '{schema}' as TABLE_SCHEMA, MAX(FILE_DATE) as FILE_DATE FROM {database}.{schema}.{table}" cs.execute(sql) df = cs.fetch_pandas_all() df1 = df1._append(df)This allows me to get the desired result for ONE schema, I need to adapt it so I can pass a list of schemas (returned from information_schema just as the table names).
Another issue I have is that the FILE_DATE column, is sometimes called MODIFIED_FILE_DATE for SOME of the tables.
So really, I am looking to pass separate SCHEMA_NAME, TABLE_NAME and COLUMN_NAME every time this is ran.