How to export data FROM rocket TO s3

I have seen similar questions, with no real answer on how to EXPORT data FROM rockset collections, TO external TARGETS.

I understand almost all the documentation is about PULLING data FROM external SOURCES into Rockset, but that is not what we want. How is the best way to do so? If there isn’t an easy way, I don’t see rockset being that useful for our use-case unfortunately.

Hey Michaele, thanks for posting your question. I’m from the Product team here at Rockset, and I’d like to understand your use case better. Can you tell me what your external target or targets are? There might be different approaches I recommend depending on what you are trying to do, so if you can add any color to what you are trying to achieve that would be great.

Hey Rafael,

I had mentioned in the title, we are trying to export data from rockset, to S3. (AWS s3). I would like to achieve this without having to write a lambda query, then manually export the Results to a json file, then have some other process to upload it.

I have been working with your SDK and am able to describe the collection, etc. But i’m assuming if i want to actually get the data itself, i would have to execute a lambda query programmatically then potentially place that data into a dataframe, and ultimately output to a file.

But if there’s a built in way to do so i’d be happy to hear more. Thanks!

Got it. I’d love to learn why you need the results of a QL in S3. Are you trying to transform data using Rockset and consume it with another service from there?

In general, our QLs are built to be consumed from applications directly due to the requirement of low latency response. For transformation jobs we also have customers use the INSERT INTO statement to another collection, that is if the transformation results are to be consumed by an app directly from Rockset of course.
We are thinking through a use case to export to S3 as part of a backup mechanism, but I don’t think that’s what you are trying to do.

The best thing I can think of for what you are trying to do is to use our Async queries or pagination mechanisms to stream the results of a QL into one or more files that you will drop into an S3 bucket. One of our solution architects wrote a ‘rockexport’ utility in python (leverging the above) that I can share with you if you message me with your email.

cc @rafael @michaele
May be we can make this open source? Let me see what I can do. Will follow up soon.

–n