Does anyone know what the back pressure response looks like. I’ve seen ‘io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED’ a lot. But most of time it is to do with oversized gRPC message.
Recently I saw something like this in the log however it is logged as INFO.
03:52:41.167 INFO c.d.p.a.s.ApiSubmissionService - Submission has failed due to backpressure
Is there any sample code on how to capture the back pressure response? There are some clients asking about this for quite a while. Also is there a way to trigger back pressure error on sandbox? That will help us a lot if we can simulate it.
Sandbox provides you with the 3 options (in the 1.3.0 RC, 1.2.0 only has 2 iirc) to configure the limits that determine when you get a backpressure response:
--max-commands-in-flight <value>
Maximum number of submitted commands waiting for completion for each party (only applied when using the CommandService). Overflowing this threshold will cause back-pressure, signaled by a RESOURCE_EXHAUSTED error code. Default is 256.
--max-parallel-submissions <value>
Maximum number of successfully interpreted commands waiting to be sequenced (applied only when running sandbox-classic). The threshold is shared across all parties. Overflowing it will cause back-pressure, signaled by a RESOURCE_EXHAUSTED error code. Default is 512.
--input-buffer-size <value>
The maximum number of commands waiting to be submitted for each party. Overflowing this threshold will cause back-pressure, signaled by a RESOURCE_EXHAUSTED error code. Default is 512.
If you lower them it should be much easier to get a backpressure response for sandbox.
As for capturing the response, ideally you would just match on RESOURCE_EXHAUSTED but unfortunately that includes non-backpressure responses as well. That behavior comes from the underlying gRPC library so unfortunately we cannot change this. Maybe someone else has a good idea here for only matching on backpressure responses.
You’re right. There’s currently no good solution for this.
We might consider signaling backpressure by using the UNAVAILABLE status code. Both descriptions for UNAVAILABLE and RESOURCE_EXHAUSTED could fit a backpressure scenario:
/**
* Some resource has been exhausted, perhaps a per-user quota, or
* perhaps the entire file system is out of space.
*/
RESOURCE_EXHAUSTED(8),
/**
* The service is currently unavailable. This is a most likely a
* transient condition and may be corrected by retrying with
* a backoff. Note that it is not always safe to retry
* non-idempotent operations.
*
* <p>See litmus test above for deciding between FAILED_PRECONDITION,
* ABORTED, and UNAVAILABLE.
*/
UNAVAILABLE(14),
As UNAVAILABLE is already a transient error and also requires some form of short-circuit/backoff, it would be better to conflate back-pressure with this condition than with permanent errors such as oversized gRPC message.
This is an ongoing issue with our project and I believe this switch would resolve some of our issues with back-pressure.