Incremental Batch Processing
I am using spark on databricks and have a delta file in my bronze layer that is ingesting json data using the autoloader, which works great on a schedule. However for my Silver layer I only want to read those new rows in the delta file, but I'm not sure what the best method is.
Currently I am just using a clunky anti join from the unique identifier which works. I'm not sure what the best practice would be for this and documentation just points to delta live tables which I'm not too keen on.
Is there a way to use the autoloader on delta to delta files to only pick up only new rows?
Or can I pull in as a parameter the last time my task ran and use that to pull all the new rows from that point?