DynamoDB Ingress with Single-Table Design

Scenario

  1. I use a single DynamoDB table with the single-table design pattern for 3 completely different types of data.
  2. I need my ingress to be clustered for 2 of these types of data on the same data field.

Restrictions

  1. Per Rockset’s documentation, clustering requires that the clustered field be specifically named in the SELECT clause.
  2. Per Rockset’s documentation, there should be only 1 collection doing the ingress transformation per DynamoDB table.

Questions

  1. What is downside of having 2 collections doing ingress transformation on the same DynamoDB table? I would use a WHERE clause on the sk to prevent the collections from pulling the same data.
  2. How does Rockset handle the fact that the fields are mostly different for different rows of the DynamoDB table? Does it flatten them into a single row with a lot of null values for fields that don’t exist for that sk?
    a. Corollary is how would Rockset handle this if the fields were the same name but had different data types in them?
  3. If I just want to pull all fields from DynamoDB, but I need the clustering field named, what is the recommended way to handle this?
  4. How does a List of Map types get imported into Rockset?

Hi there! At first glance, I think you will find the following blog very useful for your project: https://rockset.com/blog/using-dynamodb-single-table-design-with-rockset/

Regarding your questions, here are my personal answers:

  1. Separating your DynamoDB single-table into multiple collections (using a WHERE clause in the ingest transformation) will actually give better performance in the long-run. Its faster to query smaller partitions of data. The only downside is that you will have to do a separate export and stream from DynamoDB for each collection.

  2. I recommend checking out the blog I linked above for this. Data can be stored as nested JSONs. Generally, if the field names are the same (ex: “color”), then all of the data will be stored in the same column regardless of their data type (columns can have mixed data types in Rockset). If the field names are the same but they are nested (ex: “fruit.color” and “house.color”) then they will not be stored in the same field. They will be stored as a nested field in the corresponding JSON.

  3. If you want to pull all fields from DynamoDB and need the clustering field named, you can use an ingest transformation like: “SELECT _input.cluster_column, * FROM _input CLUSTER BY cluster_column”

  4. As an object or as an array of objects.