Weekly discussion thread: Dec 9th - 16th

How mutable databases make it easy to do real-time updates :scroll:

If you’re thinking about how you can do real-time analytics, it’s probably crossed your mind that you may want to do real-time updates. Real-time updates is being able to insert, delete, and update data in place. Below are 3 main reasons why you would want a mutable database for real-time updates:

1| Late arriving data in time-based window rollups: Let’s say you have a rollup that’s counting events for each hour. All of a sudden, an event comes in the next day that was supposed to be processed yesterday. Now, you need to update all the aggregates and rollups that was done yesterday. A mutable database allows you to recompute the results with the late-arriving data.

In some systems, once the time window has passed, the rollups become read-only. You can’t update the rollups because it’s stored in a non-mutable store. Other systems create new tables for late-arriving data. In those cases, your app will have to know which tables have the late-arriving data so you can do the aggregation at query time. In these circumstances, there’s more manual work involved and it’s time-consuming.

2| Data enrichment: Analytics involves a lot of enrichment: If you have a data record that you want to tag as i.e. spam it’ll be easy to just insert a field in your data record with that tag. A mutable database allows you to do this.

If your events are read-only, you have to write all the enriched-tags in a different place. Now, your app has to look at 2 different places to correlate the tags with the right events at query time. It can increase the complexity of the application code.

3| Being in sync with changes from a transactional database: Let’s say user1 wanted to update her last name stored in your transactional database. Your analytical database needs to be in sync with these changes. If your analytical database is mutable, you can easily update the changes in place.

If you’re not using a mutable analytical database, you’ll probably batch download the whole transactional database into your analytical database once a day. This has a higher operational overhead, and you also lose the real-timeliness.

Using a mutable database makes developers’ life a lot easier by simplifying the workflow, allowing you to iterate faster. You can easily do rollups on late-arriving data, add fields to your document to enrich the dataset and be in real-time sync with your transactional database.

You might be thinking, well OLTP databases like MongoDB and PostgreSQL are mutable. They are! However, they may get hung up on analytical queries where you need to scale out to large data sizes.

If you’re considering a real-time analytical database to handle complex analytical queries and scale you have a couple of options. Druid and Clickhouse are immutable databases that work for append-only. If you need a mutable real-time analytical database where data needs to be corrected, enriched or is sent out of order, Rockset is a great choice to handle these situations.

:question: Are real-time updates needed for your product? If so, how are you handling real-time updates?

update GIF