Rockset Langchain support (thanks + issue)

Great to see the Langchain support pushed out this week. Congrats, hoping the product gets some well deserved profile in that community.

I did come across an issue for the example / how-to-use code for Rockset in Langchain here refers to the class RocksetDB everywhere but the actual class name (at least what I have in langchain==0.0.210) is actually Rockset (without the ‘DB’) per the new wrapper class here.

This might already have been addressed and be in the queue for an upcoming build but thought I would mention it.

I was also having an issue where using my own sourced version of the State of the Union text for the example where the text splitting as shown in the demo was creating one large embedding for the whole document (i.e. just one row in the db) so the search part of the example wasn’t working as expected.

It could have been something to do with my text, however when I changed it to use the recursive text splitter provided by Langchain instead it worked great. I chose to split every 100 characters to give more granular search results.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0, separators=[" ", ",", "\n"])

So if anyone else playing around with the sample code has the same issue you could try that.

Thanks for the feedback Ben! You’re spot on with your observation that the example in the python notebook uses incorrect class name, it should be Rockset. This was a bad merge, I made a change to class name but looks like the python notebook wasn’t updated, I’ll push a fix to langchain.

Regarding your question about text splitting, it’s hard for me to comment without looking at the text since you were using your own sourced text, but the main difference between CharacterTextSplitter and RecursiveCharacterTextSplitter seems to be that the latter supports a list of separators, whereas the former just splits on a single separator (which is newline \n by default). Maybe that explains why your entire document ended as one split when you used the former?

Thanks for bringing this up, and feel free to share any other feedback/requests you have!

1 Like

Thanks for the reply Anubhav, I will send you a copy of the text file I used in case that is interesting.