Synchronous-commit = off optimization and transaction atomicity

Canton has a performance optimization where transactions are stored in the backing PostgreSQL db asynchronously. See end of the page here.

Does (how?) this still preserve transaction atomicity? Will the acceptance of the Transaction, completion of the command be sent over the LedgerAPI after another asynchronous response from PostgreSQL?

The setting only affects how data is written to the ledger api database (aka IndexDB). If turned off, it’s possible that you get a response from the API, for example a completion, before it was written to the database. If your Participant crashes just then, and you restart it, it will have forgotten that it had processed that completion and that offset. If you try to resubscribe from the same offset immediately after a restart, the Ledger API may reject your request because it doesn’t know that offset yet. You then have to keep retrying until the it has reprocessed past that offset.
So using that setting adds a race condition to recovering after a participant node crash. It has no effect on transaction atomicity.

How does the Ledger API recover that information on restart? From another part of the DB that is more Canton specific?

Could we guard against this race condition with specific code inside participant node startup to detect the above scenario and hold off on serving requests until the IndexDB is ready?

How does the Ledger API recover that information on restart? From another part of the DB that is more Canton specific?

Yes

Could we guard against this race condition with specific code inside participant node startup to detect the above scenario and hold off on serving requests until the IndexDB is ready?

Perhaps one could add a feature where you can pass in an offset and say “Don’t signal that you are available until you’ve indexed this offset”. But you’d then need to go around all your client applications, collect their offsets, and supply the maximum one to the participant node to restart it. Ie the orchestrator that detects that your participant is down and restarts it needs to be aware of all clients and needs to be able to get data from them.
That feels a lot more complex than saying “add retry logic to your clients on reconnecting” or “give up a few threads to do synchronous commits”

in any case, as per my answer above, the reason this option exists at all is that the LedgerAPI can recover from a lower level database in the participant that always uses synchronous commit. The much better feature would be to serve the Ledger API directly from that lower level database and thus avoid the second write altogether. That is something we are planning to do and should remove this flag and its associated tradeoffs.

I meant that the participant node would resolve its own ambiguity between the two sources with different offsets. But since that ambiguity will go away, with this option, I guess it is sufficient for now and we should monitor if there is a participant node crash with this option enabled.

@Leonid_Rozenberg I would like to understand why you consider to disable synchronous commits. Does it lead to higher throughput / lower resource usage? Could you please explain with some detail?