Canton node error handling / disconnection

jaypeeda · April 6, 2023, 4:37am

Hi team,

Question on error handling in Canton.

From my understanding on how the transaction lifecycle in Canton:

Alice encrypt the payload of the transaction and passes it to the domain.
Domain passes the payload to the nodes that are involved in the transaction
Each participant nodes validate the transaction using the Daml Execution Engine that is in their participant node
The domain’s mediator aggregates confirmations and broadcasts transaction confirmation
All the participant nodes record the transaction on the ledger of their nodes.

Now my question is, if one of the node in the network didn’t receive the transaction confirmation of step 4 due to a crash or a disconnection.

Does the domain knows which node didn’t receive the transaction confirmation?
What should be the approach of the node that went down do in that case ? Reconnect to the domain to get latest transaction confirmation?

Thanks and regards,

JP

nycnewman · April 6, 2023, 12:56pm

Minor amendment: The participant node of Alice creates a transaction and then individual sub-views of the transaction, one per participant of the other stakeholders. This is sent to the domain (sequencer and mediator), which forward each sub-view to the relevant participant. The mediator understand which participants are required to respond to the transaction so it knows what answers to expect. Each participant validates their sub-view (this is how we enforce sub-transaction privacy) and responds to the mediator (via the sequencer). Once all have responded, the overall transaction is committed back to the set of participants (or transaction failed if one or more participants reject).

I believe that retry logic is in place in event of a failure to send to one participant and eventually the transaction will timeout and be rejected if the participant never responds. If the node reconnects within the timeout period then it should get a replay of the request but if it reconnnects after timeout then the whole transaction will need to be resubmitted by the original application.

Others may have more detail on the above.

jaypeeda · April 11, 2023, 5:05am

Thanks for the clarification Edward.

May I confirm with you, this means that in step 5, the sequencer:

Records the latest result of transaction on the underlying ledger
Pass this result to the involved participants

Regardless if the participants receive the transaction or got disconnected, the transaction has been recorded on the ledger, making the transaction the truth and immutable to all stakeholders nodes

Hence if a participant node got disconnected during step 5, they will have to re-connect to the sequencer in order to retrieve latest transaction record ?

Ratko_Veprek · April 11, 2023, 12:52pm

Yes, that’s correct. The sequencer provides “total order guaranteed multi-cast”. A participant that crashes will just resume from the last known offset and catch up from there.

Generally, the participant node will reconnect to the domain on restart and start to load all the data that it has missed. Therefore, nothing needs to be done manually.

Topic		Replies	Views
Canton with DLT domain integration Questions canton	2	299	December 13, 2021
Queries regarding process of ledger/Participant node database update after transaction is completed on Canton Questions canton , database	1	116	December 13, 2023
Questions about Canton behaviour Questions canton	3	292	September 7, 2021
View projection within a participant node Questions	2	166	September 23, 2022
Canton - Sequencer time [..] Exceeding max-sequencing-time / Response message for request X timed out Questions canton	6	708	April 26, 2022

Canton node error handling / disconnection

Related topics