GRPC exceeds max size error and how to recover

I have a application that subscribes to the event stream of a ledger on DAML hub and does different types of processing depending on the event. My code to subscribe to the stream is -

 Flowable<com.daml.ledger.javaapi.data.TransactionTree> transactionTree = damlLedgerClient.getTransactionsClient().getTransactionsTrees(
            ledgerOffset,  
            new FiltersByParty(Collections.singletonMap(platformAdminParty.getIdentifier(), NoFilter.instance)), true);

        transactionTree.blockingSubscribe(this::processTransactionTree, this::streamErrorHandler);

Currently our error Handler just initiates a reconnect as 99% of the time the error is the hub ledger went down.

With some new changes made by the DAML devs we are now running into the error “RESOURCE_EXHAUSTED: gRPC message exceeds maximum size”. I have talked to our DA resources and know this value won’t be increased for hub and our DAML devs are looking into making smaller payloads. While they are doing that though I would like to come up with a solution to handle this error and continue processing the stream. So all this to ask 2 questions -

  1. I believe the issue is that there are alot of events in this one transaction tree that is making it so large. Is there a way to limit the events in the tree? I’m assuming no and that is how DAML links them so it is what it is.
  2. How can I go about skipping the offset with the error and going to the next offset to continue processing? I write to a database every successful processed offset so that I can recover from restarts so I know what he last one is but I can’t figure out how I can skip ahead a offset to skip the one throwing the error. Is it possible to know what the next offset would be? Then I could log the error, update my database with that offset, and when it reconnects it will start at the one after the one causing the error.

If you consume the transaction tree stream you can only filter by party ID (unlike the “flat” transaction stream, which only contains persistent creates and archival events, which can also be filtered by template ID), but if transaction is too large you will hit this error. If you can’t control that message size at the Ledger API server level, you need to take steps in your application to make the transaction size smaller.

To the best of my understanding, RxJava’s default behavior is that of terminating a stream when an error occurs. However, you can override this behavior using error handling operators. The one you might be interested in should be onErrorResumeNext I believe.

Without seeing the full error, it’s difficult to judge but keep in mind that max message size issues are an issue thrown by the gRPC client not the server. So even without a change in hub, you can bump your client side max message size and see if that helps.

There is a chance that it is internal communication in the ledger that is failing with this error but my guess is that this is not it.

One further note that I did not mention in my previous answer that is probably worth mentioning for the sake of clarity.

Simply put, no. :slight_smile: You have no guarantee apart from being able to compare them.

As the Ledger API reference documentation mentions:

The format of absolute offsets is opaque to the client: no client-side transformation of an offset is guaranteed to return a meaningful offset.

Luckily I believe you can leverage RxJava’s capabilties to avoid the problem as mentioned in my previous answer.

Full error below. I will give that a shot to increase on the client. Thanks.

io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 4194304: 4907210
io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 4194304: 4907210
	
at io.grpc.Status.asRuntimeException(Status.java:535)
	
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:479)
	
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)

My hesitation with onErrorResumeNext is how it will handle the disconnects that happens all the time. In those cases I want to catch it and reconnect. I don’t think I can have it both ways if I have to do it when setting up the stream where it resumes next on some error’s and not on others. I’ll play around with it though and see.

Good point. I’m not sure whether or up to which point you can mix error handling operators. I would probably investigate whether it’s possible to use retryWhen to specify a retry strategy for dropped connections followed by onErrorResumeNext to catch errors unhandled by retryWhen.

1 Like