Quantcast
Viewing all articles
Browse latest Browse all 14040

DuckDb Hash a Record

I have a use case where I want to check for conflicts for slowly changing dimensions. In theory, I should be able to do this by hashing a record of the original table and compare it to a hash of the new record coming in if the ID's match.

In SQL I can do this with a binary checksum and just provide a list of column names.

In DuckDb it appears the hash functions are all single column. Is there a convenient way to do this that isn't concatenating all of the columns together or comparing each column individually?

Alternatively, I could hash the rows outside of DuckDb in Python, but as the data is stored in Parquet, it is very convenient if I can just keep this all in DuckDb.


Viewing all articles
Browse latest Browse all 14040

Trending Articles