BackPressure

Does anyone know what the back pressure response looks like. I’ve seen ‘io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED’ a lot. But most of time it is to do with oversized gRPC message.

Recently I saw something like this in the log however it is logged as INFO.

03:52:41.167 INFO c.d.p.a.s.ApiSubmissionService - Submission has failed due to backpressure

Is there any sample code on how to capture the back pressure response? There are some clients asking about this for quite a while. Also is there a way to trigger back pressure error on sandbox? That will help us a lot if we can simulate it.

2 Likes

Sandbox provides you with the 3 options (in the 1.3.0 RC, 1.2.0 only has 2 iirc) to configure the limits that determine when you get a backpressure response:

  --max-commands-in-flight <value>
    Maximum number of submitted commands waiting for completion for each party (only applied when using the CommandService). Overflowing this threshold will cause back-pressure, signaled by a RESOURCE_EXHAUSTED error code. Default is 256.
  --max-parallel-submissions <value>
    Maximum number of successfully interpreted commands waiting to be sequenced (applied only when running sandbox-classic). The threshold is shared across all parties. Overflowing it will cause back-pressure, signaled by a RESOURCE_EXHAUSTED error code. Default is 512.
  --input-buffer-size <value>
    The maximum number of commands waiting to be submitted for each party. Overflowing this threshold will cause back-pressure, signaled by a RESOURCE_EXHAUSTED error code. Default is 512.

If you lower them it should be much easier to get a backpressure response for sandbox.

As for capturing the response, ideally you would just match on RESOURCE_EXHAUSTED but unfortunately that includes non-backpressure responses as well. That behavior comes from the underlying gRPC library so unfortunately we cannot change this. Maybe someone else has a good idea here for only matching on backpressure responses.

3 Likes

You’re right. There’s currently no good solution for this.
We might consider signaling backpressure by using the UNAVAILABLE status code. Both descriptions for UNAVAILABLE and RESOURCE_EXHAUSTED could fit a backpressure scenario:

    /**
     * Some resource has been exhausted, perhaps a per-user quota, or
     * perhaps the entire file system is out of space.
     */
    RESOURCE_EXHAUSTED(8),
    /**
     * The service is currently unavailable.  This is a most likely a
     * transient condition and may be corrected by retrying with
     * a backoff. Note that it is not always safe to retry
     * non-idempotent operations.
     *
     * <p>See litmus test above for deciding between FAILED_PRECONDITION,
     * ABORTED, and UNAVAILABLE.
     */
    UNAVAILABLE(14),
1 Like

That will be good to have. Backpressure requires re-submission while other errors may not.

1 Like

As UNAVAILABLE is already a transient error and also requires some form of short-circuit/backoff, it would be better to conflate back-pressure with this condition than with permanent errors such as oversized gRPC message.

This is an ongoing issue with our project and I believe this switch would resolve some of our issues with back-pressure.

1 Like