Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23160

Python for looping through Snowflake Tables and creating Master Table

$
0
0

I am trying to build a Python Script, that given a list of table names (retrieved from information_schema). Loops through an entire database, collecting the MAX(DATE) column per table, and throws it all together into a master table. The tables that required looping through are often in different schemas.

EXAMPLE:

SCHEMA_1.TABLE_1

datadateother_cols
yes26/02/2024sample

SCHEMA_2.TABLE_2

datadateother_cols
no25/02/2024sample

MASTER TABLE BUILT WITH SCRIPT:

table_namedatadate
TABLE_1yes26/02/2024
TABLE_2no25/02/2024

EDIT: What I have so far

database= "DATABASE"schema= 'SCHEMA'#to get list of tablescs.execute( """            select  TABLE_NAME from DATABASE.information_schema.columns where table_schema = '{schema}' AND COLUMN_NAME IN ('FILE_DATE', 'MODIFIED_FILE_DATE')  AND (TABLE_NAME NOT LIKE '%TEST%' AND TABLE_NAME NOT LIKE '%TEMP%');""") list = [x[0] for x in cs.fetchall()]data = []df1 = pd.DataFrame(data)#loop through each table and save result to dffor table in list:    sql = f"SELECT '{table}' as TABLE_NAME, '{schema}' as TABLE_SCHEMA, MAX(FILE_DATE) as FILE_DATE  FROM {database}.{schema}.{table}"    cs.execute(sql)    df = cs.fetch_pandas_all()    df1 = df1._append(df)

This allows me to get the desired result for ONE schema, I need to adapt it so I can pass a list of schemas (returned from information_schema just as the table names).

Another issue I have is that the FILE_DATE column, is sometimes called MODIFIED_FILE_DATE for SOME of the tables.

So really, I am looking to pass separate SCHEMA_NAME, TABLE_NAME and COLUMN_NAME every time this is ran.


Viewing all articles
Browse latest Browse all 23160

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>