main function
def main_fn(url, database, table, user, password, df):
if not table_exists(spark, database, table): create_empty_table(url, database, table, user, password, df) alter_cmd(database, table)write_Overwrite_DF_To_SQL_Table(url, database, table, user, password, df)def write_Overwrite_DF_To_SQL_Table(url, database, table, user, password, df):
df.write \ .format('jdbc') \ .option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver') \ .option('url', url) \ .option('database', database) \ .option('dbtable', table) \ .option('user', user) \ .option('password', password) \ .mode('overwrite') \ .save()print(f'Data copied Successfully for {table}')`I am working in one project,My requirment is ->I am writing the tables from ADB to Azure sql.But in proccess of copying i need to on the compression on table in Azure SQL.First i am creating a blank table and using alter command i am enabling compressionbut when i try to write data using df.write.mode('Overwrite')this is overwriting the meta data also hence compresion is not enabling.
I want to create a blank table and then enable the compression an that table then copy the data on that table keeping compression on.
I am new to Pyspark. is there any write mode or .option() where i can keep the same meta data hence compression setting will be same and then load the df.this proccess will be done through pipeline hence i have to think also everytime pipeline runs there is no issue