Why UNNEST is still required since the nested data has already been flatten during ingestion?

The converged indexing contains columnar storage format which has stored the nested data in a flatten format, then why UNNEST is still used in the SQL language?

Hi @yingfeng ,

UNNEST is still needed because arrays of objects are not indexed in the search index and stored as an array in both the column index and row index. To filter or project the attributes within those objects requires UNNESTing.

If the array only contains scalars (ints or strings), those are automatically indexed in the search index and can be used for search queries.

I hope that helps explain things a bit.

Patrick

Hi, Patrick

So you mean that the array/object is still stored into columns without unnesting? It seems different from what I’ve got from the slides: for instance, the following example is always used to illustrate the converged columnar indexing:

<doc 0>
{
“name”:“Igor”,
“interests”:[“databases”,“snowboarding”],
“last_active”:2019/3/15
}
<doc 1>
{
“name”:“Dhruba”,
“interests”:[“cars”,“databases”],
“last_active”:2019/3/22
}

The column index has the field:

“interests”:
0.0 databases
0.1 snowboarding
1.0 cars
1.1 databases

According to my understanding, the above columnar storage has stored the flattened array of “interests”(to break the array into several key-value pairs), as a result, it seems that neither array nor object exist within the columnar store?

Additionally, why search index does not take into effect on the array/objects? It seems they can also benefit from the serving performance brought by the search index.

@yingfeng,

In your example, you have an array of strings. In this case those strings are indexed in Rockset’s search index and if you qualified on “interests” the fastest path would likely be the search index.

I was talking about arrays of objects like this:
{
“name”: “Igor”,
“interests”: [{“name”: “databases”}, {“name”: “snowboarding”}]
}

In this case the array is in both indexes (column and search) as an array but the individual objects are not indexed separately. The unnest is necessary to evaluate expressions against the attributes of all objects in the array. You could do queries that take advantage of the search index if you know the array index or if you just want to do simple test like “WHERE interests[1] IS NOT NULL”. This will leverage the search index to qualify just those documents that have that array.

I hope this helps. Feel free to start a trial and see how the different indexes are used with your own data.

Regards,
Patrick

Hi, Patrick

So it means the array of native types will be unnested in both columnar and search index, while the array of objects, or some other deeper nested structs will not be unnested until query, which is actually some kinds of “schema on read” ?

Hi @yingfeng,

One correction to my answer, arrays of objects are indexed in the search index. You use UNNEST to access those objects at query time and can qualify on attributes within those objects. If you find the optimizer is not using the search index to access documents in this case, try using the search index hint.

Take care,
Patrick

Hi, Patrick
So does it mean that in the column index, the array will not be flatten if it contains nested structure, and only array of native types would be flatten actually?

@yingfeng,

I am not sure I totally understand your question. My suggestion at this point would be to consider a trial, upload a sample of your data and see which indexes are being used for your query in the query profiler. It’s best to think of Rockset as a system where you query indexes made from your data. Flattening or not flattening isn’t really what is happening in Rockset, we are just looking at the document attributes and indexing them in our converged index.

Regrads,
Patrick