Sending data to a Rockset Collection from a local file system or a cloud bucket

This week’s code sample shows you how to send data to a Rockset collection from your local file system or a cloud bucket.

The script takes in the information from a user based on their API Key, the desired workspace, and collection. Then, they point the code to a file share of some kind (i.e. local file share or cloud bucket) and the script will cycle through it and send the data to Rockset through the Add Docs API.

This is useful for allowing the user to send data to Rockset from anywhere when a source is not natively supported.

You can use it by putting the client somewhere with access to both the data source and Rockset, downloading the right Python modules, and then inputting the required Rockset connection info.

Requirements before running the script

The rest of the modules like getpass, sys, json, glob, and time come native to Python.

from rockset import Client
import json
import getpass
import glob
from time import sleep

def main(server, key, path, coll):
    rs = Client(api_server=server, api_key=key)
    filenames = glob.glob(path)

    for f in filenames:
        print("sleeping for 50 millis")
        with open(f) as json_file:
            rs.Collection.add_docs(coll, json.load(json_file))
            print("uploaded file {}".format(f))

if __name__ == "__main__":
    apiserver = getpass.getpass(prompt='API Server: ')
    apikey = getpass.getpass(prompt='API Key: ')
    path = getpass.getpass(prompt='Path to files: ')
    coll = getpass.getpass(prompt='Rockset Collection: ')
    sys.exit(main(apiserver, apikey, path, coll))

World Cup Love GIF by Molang