Force Async Query

We are primarily running async queries through the Rockset API by specifying async_options in the POST /v1/orgs/self/queries endpoint. Many queries will exceed the client_timeout_ms and return a query_id that we will later poll for query status, and some will just return a result set on the initial call.

This is a fantastic optimization for production, but it plays a bit of havoc with integration tests. We have two possible execution paths - the API call may return results, or may return a query ID, and we need to handle and test both on our side. When a test is expecting to get back a query ID but sometimes doesn’t or vice-versa, it can create false negatives and flakey tests on one path or the other. We’re compensating by using unit tests and mocks, but we would love to be able to consistently test this with integration tests to have a higher confidence before deploy.

Previously we were sending client_timeout_ms of 0 to try to force the response to always return a query_id for those tests, which seemed to work for a while, but recently we started getting back completed queries if they execute very quickly. Would it be possible to standardize the behavior to always return a query_id if we send a client_timeout_ms of 0?

That endpoint should always return a query_id regardless of if the query timed out or not. The query_id (as shown in this example response) is listed after the results set. Can you send an example of the full response you are receiving?

Also FYI the lowest the client timeout can be set to is 100ms which explains why you are still getting results in the response when setting the client timeout to 0ms. I can see how that’s confusing so I will be sure to add the minimum timeout to the docs!

Hi Sofia,

I’m sorry, I misspoke. We do get a query_id back from the start query request, however that query_id cannot be passed to the GET /v1/orgs/self/queries/{queryId} endpoint (the call returns “No query with id “xxxx” currently executing.”).

We can easily test the case where the query returns results immediately in the start response by specifying a very high client_timeout_ms.

However, in the case that a query runs longer than 100ms, our logic will start the query, then poll until it is completed (by calling that get query endpoint), then download the results.

This is the logic that is hard to reliably test, because if the initial query executes in less than 100ms, the first poll to get query will always fail with that error. We could handle this case gracefully, match on that error string, and just assume that the query has been completed on the initial start request, but then it would be hard to know if we’re suppressing legitimate issues or there is an issue with our polling logic (since it wouldn’t be exercised in the test if the query took less than 100ms).

In addition to setting the client_timeout_ms to 0 to force the get query call to work, we tried building extremely complex queries in an attempt to always make it execute longer than 100ms. Those tests are very flakey because sometimes it will still take less than 100ms, so the get query API call will fail and fail the test.

Thanks!!
Jim

Hi Jim!

The /v1/orgs/self/queries/{queryId} endpoint will only return if the query is running, queued, or has paginated results. Async queries are automatically paginated if any of the following is true:

  • the results exceed 100MB
  • the results exceed the specified max_initial_results
  • the time of the query exceeds the specified client timeout (min 100ms)

Otherwise the results are returned directly just like any other query. So for your use case I recommend setting the max_intial_results field to 0 to force pagination. Let me know if this works for you!

Sofia :slight_smile:

That worked perfectly - thank you Sofia!!

Jim

1 Like