No Rollups for Mongo Atlas source is very limiting

Hi (-:

The Mongo Atlas source is very useful and one of the “flag” features that shows the power of Rockset IMO. But still, is missing the last mile, to make it truly useable.

You can not use Rollups (or change/transform anything) when using Mongo Atlas as a source. You can drop a field (which the UI for is very buggy, but that is another topic), and that is about it.

I was told that adding Rollups/Transforms is not possible since Rockset uses the Change Stream API and gets only partial updates and not the whole document each time.

As a workaround, I’ve created a WRITE API Rockset collection, which allows full Rollups/Transformations and added a Realm Trigger in Atlas that uses Rockset’s REST API. The Realm Trigger emits the whole document each time, and not only updates:

Then we use this Rollup:

This is a very nice use case for Realm triggers, but can’t we avoid it, to begin with (:?
It’s another moving piece, and on a large scale could be a significant expense.

As for now, the Mongo Atlas source is very limited.

1 Like

Hey @yaron,

Mongo is an interesting one. The CDC stream on an update will give a minimum of two attributes to Rockset, the doc ID and the attribute that was changed. Which means that if a rolled up attribute, or even a field mapping attribute, might not exist and therefore break the process.

There are two current work-around that you could look to leverage to achieve these desired steps:

  1. The Write API based collection, as you mentioned before

  2. A tool like DBT or even a Query Lambda that runs an Insert Into command into another collection

These both would allow field mapping and rollups as needed.

Can you please elaborate more on the Query Lambda option?

Also, you’ve mentioned the “CDC stream”, dose Rockset listens to the Change Stream API in Atlas?

Yep, basically you create a Rollup Collection via the API which defines the desired Rollup Field Mapping. Then, you can create a Query Lambda that runs an INSERT INTO query and inserts the query output into the Rollup Collection. Then you can trigger that Query Lambda once every 15 or 30 minutes as needed

I think the Lambda won’t be enough. Updating every couple of minutes is not like syncing in realtime.

You can run it once every 2-3 seconds if you want. Doesn’t have to be every couple of minutes

Do you mean inside Rockset? Rockset has a mechanism to run a Lambda each 2-3 seconds automatically?

Not yet. But running the cURL command on a CRON schedule wouldn’t be too difficult to configure. An AWS Lambda function running it should stay within the free tier