I have ten Rockset collections connecting to two Dynamodb instances. Overall there are no more than 2,000 records between the two DBs. My DDBs are on On-Demand provisioning. Last month i had 81 million Streams RCU over the DDB free tier. The odd record is inserted into DDB a couple of times a day. I thought it may have to do with Allocated RCUs in Collections → Sources which is set to 10 but cannot see how to change this even if I create a new collection.
I wouldn’t think the volume of reads is normal. Especially after reading the following post:
Any ideas? Cheers.
Hello! I believe the limit you are referring to is RRU. RCU is slightly different.
RCU (read capacity units):
This limit is only for the initial scan. Each DynamoDB table is configured with RCUs, which represent an upper bound on the read requests allowed. This limit can be configured at collection creation time if using the table scan method for the initial load. Check out our docs on optimizing RCU, for more information.
RRU (read request unit):
After the initial scan has finished, Rockset will continue to fetch new records during the streaming phase (CDC). The RRU is the number of requests made by Rockset during this phase. By default, Rockset makes 1 request per second per shard to maintain low data latency. Even if there aren’t frequent updates, Rockset will still keep polling at the same rate. A single DynamoDB table can have multiple shards, so multiple tables with multiple shards consistently being polled at 1 req/sec/shard can quickly add up.
Fortunately, Rockset is in the process of rolling out the capability to make RRU user-configurable this week! There are some tradeoffs to understand. Making RRU too large can result in high latency and potential data loss. DynamoDB streams only have a 24 hour retention so too high RRUs risk not capturing those change events.
I’ll post in this thread again when RRU configurability is live. Thanks for your inquiry!
Excited to try RRU configurability.
Follow up question, does increasing the number of Rockset collections increase the number of RRUs on a single Dynamodb table? Example, if i had five collections all connected to the same DDB table would there be five streams polling at 1 req/sec/shard (each collection handling their own syncing), or just one stream polling at 1 req/sec/shard (Rockset retrieves all changed data on the DDB table and then distributes the changes to the individual collections).
Thanks in advance. Cheers.
Increasing the number of collections in Rockset does increase the number of RRUs on a single DynamoDB table. Each collection handles their own syncing. Soon, the RRU will be user-configurable per source which will give users more control on managing this sync. Note that a single collection can have multiple sources.
Thanks for the feedback. So I’m going to delete the Rockset collections I use for development when they’re not being used. That should save me money. Cheers.
Hello again! As of today, polling frequency for DynamoDB collections are now user configurable.
The config name is dynamodb_stream_poll_frequency and the full description of this config can be found in our source config docs for DynamoDB.
If you would like to update this frequency for an existing collection, you can do so using the Update Collection Source API endpoint. More information on how to update source settings can be found here.
Let us know if you have any questions!