How to handle S3 file overwritten

It seems that if I just have one S3 file and it gets overwritten that the collection in Rockset will append it as new data each time. Is there a setting to handle full loads or do I need to handle duplicates with a SQL transform step in Rockset?

Hi @jonathan,

Yeah, this would need to be handled via a SQL transform step. You can output an _id field in the ingest transformation to indicate that a record is a duplicate: Special Fields. Note that this will merge the new record with the old one; in other words, fields in the old record are preserved unless the new record overwrites them.

As a side note, if a file is deleted from the bucket, the corresponding data is not deleted from Rockset: Amazon S3.

1 Like