Connecting Spark to Rockset

How do I connect Spark to Rockset?

Hi @kaiser !

I have this blog that I wrote: Getting Started with Apache Spark, S3 and Rockset for Real-Time Analytics

SPARK

  1. you need to format the data where it is semi-structured (json, csv, etc)
  2. you need to write to S3 from spark

Rockset

  1. do a Rockset integration via S3
  2. Make sure you set up your IAM policy and role on AWS
  3. Write your queries
  4. Do a visualization via tableau, retool, grafana, or etc

Some resources

Thanks @nadine , we’ve set up the IAM policy on S3 and created a user. We’re now wondering how to send the sentiment scores as csv of integers from Spark to Rockset. Any pointers on how we configure that in the Spark output?

1 Like

Did you get a chance to watch the video at the bottom of the Spark blog? I walk through the set up. Please watch this video and see how it can help with the setup you already have.

You need to send the Spark data first to S3, from there integrate with S3 and Rockset. Let’s focus on sending the Spark data to S3 :slight_smile: .

Here’s the full code for my setup:

Also, take a look at this medium blog:

1 Like