Historical load for collection with Kafka source

yaron · September 17, 2021, 9:46am

Hi

I’ve created a collection with a source of Managed Confluent Kafka. Is there a way to also load data into it from WriteAPI or S3 Bulk Load or any other method?

I prefer not to pollute the Kafka topic with historical data and only pump fresh data into it with a low TTL.

nadine · September 17, 2021, 5:14pm

Hey @yaron,

You can use the Write API to get the messages from Kafka and send it to a Rockset Collection. You would have to get the messages from Kafka and send it to a Rockset collection.

For the latter half of your question-> can you elaborate more? Are you trying to use the Apache Kafka and the Write API to send to 1 Rockset Collection? Are you trying to avoid using the Apache Kafka connector altogether by using the Write API or writing to S3, and then do a Rockset integration with S3?

Please, let me know more! Look forward to your response!

Best,
n

yaron · September 17, 2021, 5:18pm

@nadine I want a collection which have a source if Kafka. But I also want to perform one-time historical bulk load, from S3.

nadine · September 17, 2021, 5:33pm

Hi @yaron !

Awesome! Does this help answer your question: would writing kafka data to s3, and then do a Rockset integration with S3 work? In this case, you’ll do a one-time bulk load + you’ll get continuous sync. Would this not satisfy the requirements you’re looking for? If not, can you describe the requirements you are looking for?

yaron · September 17, 2021, 5:36pm

Mm… didn’t quite understand
When creating the collection what source should I choose?

nadine · September 17, 2021, 5:38pm

@yaron,

I think I’m understanding more what you want: you want to create 1 Rockset collection, where you use the Apache Kafka Data Connector + do a historical bulk load into S3. Is this correct? If so, it’s recommended that you don’t mix up different sources for 1 Rockset collection. It’s best to create 2 collections: 1 for the apache Kafka data connector, and 1 for S3… Does this help answer your question?

yaron · September 17, 2021, 5:42pm

Yes…
So basically it’s not possible
This would be a great feature.
For example, you want to create a Rollup that updates live from a Kafka source. But you want a one-time bulk load from S3 to fill up the Rollup with historical data.

nadine · September 17, 2021, 5:45pm

Hi @yaron,

Thanks so much for the feedback. I’ll share this with product! if you can post on #product-feedback , I’ll relay this to them.

Thanks @yaron for helping me understand your use case.

Topic		Replies	Views
The given kafka integration and collection are not compatible Open Q & A	1	344	January 26, 2023
Kafka Integration - Upserts and Deletes Data connectors	5	519	October 15, 2021
About the Data connectors category Data connectors	0	494	May 25, 2021
Connecting Spark to Rockset Data connectors	3	444	June 11, 2022
Kafka Integration (w/ Kafka Connect) - How to read from the beginning of a topic? Open Q & A	6	923	September 21, 2022

Historical load for collection with Kafka source

Related Topics