How Converged Index & querying works on Rockset

Converged Index: real-time indexing
Rockset’s Converged Index is the combination of a row index, a columnar index and a search (inverted) index built on top of a key-value store abstraction, RocksDB. Each document stored in the Converged Index maps to many key-value pairs in the key-value store.

  • Columnar index : Each column is stored separately. Columnar storage is often used in analytical databases and data warehouses like Snowflake and Amazon Redshift. It delivers two key advantages:
    • Data compression because data that looks similar is stored closer together
    • Efficient vectorized processing for fast queries because Rockset can scan and operate on large batches of columnar data
  • Row index : The row index refers to storing data in row orientation, which is fairly standard in databases. It optimizes for row lookups and is how Postgres and MySQL are organized.
  • Inverted index : Similar to search engines like Elasticsearch, Rockset stores the map between a value and the list of document IDs that contain that value. For queries, this means quick retrieval of a list of document IDs that match a particular predicate.

Querying data on Rockset
SQL queries are optimized to execute in real-time with millisecond latency. Once the real-time indexing is done (Converged Index), Rockset accelerates and executes queries with very little computer usage to power low latency, high concurrency queries.

How data freshness and real-time indexing are correlated on Rockset
Real-time indexing is all about data freshness. Once you ingest tons of data on Rockset (i.e. terabytes), Rockset automatically does real-time indexing, so the data is available to query within seconds. Now, you’re running queries on the freshest data set.

Slicing and dicing data on Rockset
Since the data is automatically indexed on Rockset, you can slice and dice data however you want. Let’s say you need to find a particular customer who did some sort of action, you can do that within milliseconds on Rockset. Real-time indexing prevents you from doing a huge amount of scans of all the thousands of different customer data, just so you can pull an event/action from a single customer.

This is why Rockset is powered to do real-time apps like user-facing analytics, customer 360, personalization and more…