What happens when the limit set by `--max-commands-in-flight` in Sandbox is exceeded?

Since DAML SDK 1.1.0, the Sandbox has a flag --max-commands-in-flight. What happens when I submit more commands than that limit in parallel? Do they get queued? Do I get an error?

2 Likes

If commands are piling up on the Ledger API server, those received while the threshold is reached via the CommandService or the CommandSubmissionService will be rejected with the RESOURCE_EXHAUSTED error until there will be space on the queue. This is backpressure from the Ledger API server and I would recommend using some form of exponential back-off retry strategy if that code is returned.

3 Likes

@stefanobaghino-da I noticed that RESOURCE_EXHAUSTED was thrown when the grpc message size was reached as well. Is there a more granular exception that represent this back pressure?

The grpc message size limit exception

6:47:58.490 [client-1] ERROR c.d.g.a.SingleThreadExecutionSequencer [ : - ] - Unhandled exception in SingleThreadExecutionSequencer.
io.reactivex.exceptions.OnErrorNotImplementedException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 4194304: 7956683
at io.reactivex.internal.functions.Functions$OnErrorMissingConsumer.accept(Functions.java:704)
at io.reactivex.internal.functions.Functions$OnErrorMissingConsumer.accept(Functions.java:701)
at io.reactivex.internal.subscribers.LambdaSubscriber.onError(LambdaSubscriber.java:79)
at io.reactivex.internal.operators.flowable.FlowableFlattenIterable$FlattenIterableSubscriber.checkTerminated(FlowableFlattenIterable.java:396)
at io.reactivex.internal.operators.flowable.FlowableFlattenIterable$FlattenIterableSubscriber.drain(FlowableFlattenIterable.java:256)
at io.reactivex.internal.operators.flowable.FlowableFlattenIterable$FlattenIterableSubscriber.onError(FlowableFlattenIterable.java:182)
at io.reactivex.internal.subscribers.BasicFuseableSubscriber.onError(BasicFuseableSubscriber.java:101)
at com.digitalasset.grpc.adapter.client.rs.BufferingResponseObserver.lambda$onError$3(BufferingResponseObserver.java:81)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

This is a native gRPC error, not generated by the Ledger API server. This happens usually when you have a large DAML package and trying to upload it at once causes the default threshold to be reached. My first advice would be, if you can, to break up your codebase in smaller packages, which will pay off in maintenance, especially making model upgrades easier. If despite this your package is still too large, when using the sandbox you can use the --max-inbound-message-size to bump the threshold as needed.

@stefanobaghino-da In this case, it caused by a choice that generated a large number of contracts (i.e. 100,000). The choice allows user to enter how many contracts they want to create and hence the problem. Does the back pressure error also a native gRPC error? Just wondering how should I response to the errors accordingly in my integration layer. (i.e. retry or raise an alert. In the message oversize example, the command has been executed and the contracts have been created while in another case it has not.)

Transactions are sent as a single message. If the message reaches the gRPC message size threshold, the server will refuse to send it. This will not change on retries, as the transaction is always the same. You need to bump the message size threshold.