Why might commands take much longer to complete through gRPC than JSON API?

Hi there!

We’re swapping away from the JSON API to use the gRPC directly, so we can support our pagination needs ourselves. As a result, all of our read/exercise commands are now going through gRPC and we’re seeing a dramatic increase in the time it takes to respond.

Commands that used to take around half a second, are now taking 20 seconds or more, regularly resulting in TIMEOUT errors from gRPC. We initially thought it might be a network issue on our side, but it’s happening in different environments and the commands are actually reaching the ledger.

Can anyone suggest why this might be happening or what we can do to determine the source of the problem?

We get this issue with the sandbox, and against postgres. But when using postgres we’re seeing log messages such as:

2021-09-09 10:50:39,200 UTC level=INFO  [daml-on-sql-akka.actor.default-dispatcher-321] c.d.p.a.s.ApiSubmissionService: Submitting transaction (context: {readAs=[], submittedAt=2021-09-09T10:50:39.199647Z, applicationId=myAppId, deduplicateUntil=2021-09-09T10:50:59.199647Z, participantId=my-participant, actAs=[MyParty], commandId=2504872760613040, workflowId=my-workflow-1}) 
2021-09-09 10:50:59,553 UTC level=INFO  [daml-on-sql-akka.actor.default-dispatcher-321] c.d.l.c.s.c.t.CommandTracker: Command 2504872760613040 (command timeout 2021-09-09T10:50:59.199564Z) timed out at checkpoint 2021-09-09T10:50:59.553874Z. 

Thanks in advance.

The JSON API just calls the underlying command submission service. Looking at the log that’s also what you’re using here so if anything I’d expect the gRPC API to be faster here.

There are a few things that could be different:

  1. The JSON API creates a single LedgerClient and keeps that around. If you create a new one per command you will pay some overhead for establising the connection. I definitely don’t expect that to be 19.5 seconds though.
  2. Make sure your setup is really identical. test the commands against the same running ledger instead of starting separate ledgers to have the setup as close as possible.
  3. Make sure the commands are really the same and you’re not sending something different via the gRPC API.

Given how large the difference is my suspicion is that either your setup is wildly different or you’re sending different commands.

Yeah, I’ve verified that the same command, against the same sandbox ledger, using the JSON API instead of the gRPC takes much much less time.

Though as far as I can tell, all that is different is that the key and choice argument is converted to gRPC Value and sent along the gRPC connection. We’re setting one client at the app startup and using that same one for listeners and commands, so it shouldn’t be related to your first point.

It would probably be valuable to understand where the issue is if you would share a bit on how you issue commands. As mentioned the JSON API simply forwards requests to the Ledger API, albeit using the Scala bindings (that we use internally but are currently deprecated for external usage as part of the SDK), whereas I believe (correct me if I’m wrong) that you are using the Java bindings, but the differences between the two are so minimal (both being relatively thin layers on top of the gRPC generated code) that it’s hard to me to think of possible issues that may cause this.

Solution to this is that a colleague accidentally set a min_ledger_time_rel value of 20 seconds when they intended to set a deduplication_time instead.

Lesson for the day: Be careful with complex methods taking many values!