Creating a "live leaderboard" with a Rollup collection

Hi!

As of today, we are still using a Redis Sorted Set to serve our live daily leaderboard screen. Back then, we tried serving it from Rockset.
So every document would look like this:

We queried the collection for the top 10 users with the most steps on a certain day. When querying the collection to build the leaderboard we would use the RANK() operator. That method was not practical and was too slow/demanded high resources.

Now with Rollups, there might be a better way using this rollup definition:

SELECT
  userId,
  day,
  SUM(steps) AS steps,
  RANK() OVER(PARTITION BY day ORDER BY SUM(steps) DESC) AS position,
FROM
  _input
GROUP BY
  userId,
  day

This Rollups work and seem to sustain correct results as data is coming in. If you want to get the top 10 users you just do a simple “leaf” query for rows with rank 1,2,3… No sorting is needed so querying this collection is an order of magnitude faster than the old way.

My question is, will that Rollup scale under high traffic? Is it some kind of a “hack” I’ve managed to create or a legitimate Rollup definition?

Any help would be much appreciated.

Hi @yaron,

You are doing the aggregations correctly:
SUM(steps) *AS* steps

This will be computed at ingestion time and not at query time.

Now, for the window part of the query:
RANK() OVER(PARTITION BY day ORDER BY SUM(steps) DESC) AS position

this will be actually run at query time, NOT, at ingestion time. Yes, this is scalable because the aggregation part of the query is now computed at ingest time and does not take up compute resources at query time.

Please let me know if this answers your questions, and I hope this helps.
Nadine

I didn’t understand (-:

So in query time nothing will run?
Everything will be done in ingest time?

Hi @yaron!

Sure, no problem. So, it’s a mix of both. SUM(steps) AS steps will run at ingestion time. And, RANK() OVER(PARTITION BY day ORDER BY SUM(steps) DESC) AS position will run at query time. Does this help explain it? Let me know which part I should elaborate further.

Best,
n

Mm… are you sure?
Because when I look at the collection in the dashboard, the position (rank) property is there, always ready and updated. I don’t need to query the collection.

Hi @yaron! This is a very great question. I think this blog will help you:

Particularly this part of the blog:

In short, SUM() is applied at data ingestion time. Then, we apply RANK() on top of the pre-calculated result of SUM() and this is ran at query time.

When you look at the preview and collection, you’ll see the full rollup query being run, including the RANK() part, and not just the part that happens at data ingestion. The part that’s important, is you’re saving on compute at query time when you apply the SUM() during rollups. But, compute is being used at query time for RANK().

Does this help a little more with your question? Please let me know.

Best,
n

Thanks! Got it…

1 Like