I’ve created a collection with a source of Managed Confluent Kafka. Is there a way to also load data into it from WriteAPI or S3 Bulk Load or any other method?
I prefer not to pollute the Kafka topic with historical data and only pump fresh data into it with a low TTL.
You can use the Write API to get the messages from Kafka and send it to a Rockset Collection. You would have to get the messages from Kafka and send it to a Rockset collection.
For the latter half of your question-> can you elaborate more? Are you trying to use the Apache Kafka and the Write API to send to 1 Rockset Collection? Are you trying to avoid using the Apache Kafka connector altogether by using the Write API or writing to S3, and then do a Rockset integration with S3?
Please, let me know more! Look forward to your response!
Awesome! Does this help answer your question: would writing kafka data to s3, and then do a Rockset integration with S3 work? In this case, you’ll do a one-time bulk load + you’ll get continuous sync. Would this not satisfy the requirements you’re looking for? If not, can you describe the requirements you are looking for?
I think I’m understanding more what you want: you want to create 1 Rockset collection, where you use the Apache Kafka Data Connector + do a historical bulk load into S3. Is this correct? If so, it’s recommended that you don’t mix up different sources for 1 Rockset collection. It’s best to create 2 collections: 1 for the apache Kafka data connector, and 1 for S3… Does this help answer your question?
Yes…
So basically it’s not possible
This would be a great feature.
For example, you want to create a Rollup that updates live from a Kafka source. But you want a one-time bulk load from S3 to fill up the Rollup with historical data.