How to export data FROM rocket TO s3

michaele · September 15, 2022, 4:20pm

I have seen similar questions, with no real answer on how to EXPORT data FROM rockset collections, TO external TARGETS.

I understand almost all the documentation is about PULLING data FROM external SOURCES into Rockset, but that is not what we want. How is the best way to do so? If there isn’t an easy way, I don’t see rockset being that useful for our use-case unfortunately.

rafael · September 15, 2022, 5:40pm

Hey Michaele, thanks for posting your question. I’m from the Product team here at Rockset, and I’d like to understand your use case better. Can you tell me what your external target or targets are? There might be different approaches I recommend depending on what you are trying to do, so if you can add any color to what you are trying to achieve that would be great.

michaele · September 15, 2022, 6:03pm

Hey Rafael,

I had mentioned in the title, we are trying to export data from rockset, to S3. (AWS s3). I would like to achieve this without having to write a lambda query, then manually export the Results to a json file, then have some other process to upload it.

I have been working with your SDK and am able to describe the collection, etc. But i’m assuming if i want to actually get the data itself, i would have to execute a lambda query programmatically then potentially place that data into a dataframe, and ultimately output to a file.

But if there’s a built in way to do so i’d be happy to hear more. Thanks!

rafael · September 15, 2022, 8:00pm

Got it. I’d love to learn why you need the results of a QL in S3. Are you trying to transform data using Rockset and consume it with another service from there?

In general, our QLs are built to be consumed from applications directly due to the requirement of low latency response. For transformation jobs we also have customers use the INSERT INTO statement to another collection, that is if the transformation results are to be consumed by an app directly from Rockset of course.
We are thinking through a use case to export to S3 as part of a backup mechanism, but I don’t think that’s what you are trying to do.

The best thing I can think of for what you are trying to do is to use our Async queries or pagination mechanisms to stream the results of a QL into one or more files that you will drop into an S3 bucket. One of our solution architects wrote a ‘rockexport’ utility in python (leverging the above) that I can share with you if you message me with your email.

nadine · September 16, 2022, 9:36pm

cc @rafael @michaele
May be we can make this open source? Let me see what I can do. Will follow up soon.

–n

Topic		Replies	Views
Connecting Spark to Rockset Data connectors	3	442	June 11, 2022
How do i copy data from rockset to my local dynamo db Open Q & A	0	393	March 3, 2022
Compatibility with some Query Builder Open Q & A	3	347	March 30, 2023
PostgreSQL with jsonb? Querying your data	1	520	September 29, 2021
Next.js app and query lambdas SDKs	1	493	September 1, 2021

How to export data FROM rocket TO s3

Related Topics