Historical load for collection with Kafka source

Hi :slightly_smiling_face:

I’ve created a collection with a source of Managed Confluent Kafka. Is there a way to also load data into it from WriteAPI or S3 Bulk Load or any other method?

I prefer not to pollute the Kafka topic with historical data and only pump fresh data into it with a low TTL.

Hey @yaron,

You can use the Write API to get the messages from Kafka and send it to a Rockset Collection. You would have to get the messages from Kafka and send it to a Rockset collection.

For the latter half of your question-> can you elaborate more? Are you trying to use the Apache Kafka and the Write API to send to 1 Rockset Collection? Are you trying to avoid using the Apache Kafka connector altogether by using the Write API or writing to S3, and then do a Rockset integration with S3?

Please, let me know more! Look forward to your response!


@nadine I want a collection which have a source if Kafka. But I also want to perform one-time historical bulk load, from S3.

Hi @yaron !

Awesome! Does this help answer your question: would writing kafka data to s3, and then do a Rockset integration with S3 work? In this case, you’ll do a one-time bulk load + you’ll get continuous sync. Would this not satisfy the requirements you’re looking for? If not, can you describe the requirements you are looking for?

Mm… didn’t quite understand :slightly_smiling_face:
When creating the collection what source should I choose?


I think I’m understanding more what you want: you want to create 1 Rockset collection, where you use the Apache Kafka Data Connector + do a historical bulk load into S3. Is this correct? If so, it’s recommended that you don’t mix up different sources for 1 Rockset collection. It’s best to create 2 collections: 1 for the apache Kafka data connector, and 1 for S3… Does this help answer your question?

So basically it’s not possible :slightly_smiling_face:
This would be a great feature.
For example, you want to create a Rollup that updates live from a Kafka source. But you want a one-time bulk load from S3 to fill up the Rollup with historical data.

Hi @yaron,

Thanks so much for the feedback. I’ll share this with product! if you can post on #product-feedback , I’ll relay this to them.

Thanks @yaron for helping me understand your use case.