Generating sample data for Rockset with Mockaroo

Seeing is believing and this is why the ability to see your data in whatever tool or data platform you are using or may want to use is really important. Sample data, sometimes insensitively referred to as “dummy data”, provides a safe way to see data that looks like production but isn’t. There are variety of ways to generate sample data but my favorite as been Mockaroo.com. A website designed specifically for generating data, especially data that looks very real and provides ways to add both variability and consistency.

  • Have a list of values and want one of them to show up 5 times more often? Mockaroo can do that.
  • Want to generate random IP addresses? Mockaroo can do that.
  • Want to create highly nested structures and some complex JSON? Mockaroo can do that too.

In addition, Mockaroo provides an API to generate data from predefined schemas or on the fly. So as a Rockset user who enjoys APIs it seemed obvious to create something that could easily publish sample data from Mockaroo to Rockset using only REST APIs. If this is something you are interested in doing, I published my example of doing so using Python but it could easily be abstracted to any HTTP library.

Feedback can be provided on this thread or if something is really messed up you can always file a GitHub issue. While I am a Rockset employee, I have no affiliation with Mockaroo. It’s just a great service even at the free tier. Now you can generate and analyze all of that sample data while you wait for the real thing.

1 Like

this is awesome! thanks!