Daml bot handle failures

We are currently investigating possible solutions to failures on dams bots. What is the best way to ensure a contract is processed exactly once?
For instance, if we are using event callbacks, we can ensure a contract is processed 0 or 1 times (because if the bot is down, the event would not be received). If we are using streams, and the bot goes down, we would have to restart the stream from the same point where it was when it stopped. This would be a great use for boundaries but they are not stored, which brings the problem of the bot going down, contracts being created and thus the boundary being updated on the next run.
What would be the best practice to solve this problem?
Store the boundary on some contract of the ledger and updating it every time a contract is processed? So that if we restart we can start from the same point?

1 Like

Store the boundary on some contract of the ledger and updating it every time a contract is processed? So that if we restart we can start from the same point?

This, or to make the response to the contract consuming. E.g. If your bot is responding to a “Request” contract, have it look for all open “Request” contracts regardless of when they were opened; this way if the bot goes down/is offline, no workload will be missed.As a bit of an aside - in distributed systems such as these, baking this kind of resiliency in is strictly necessary. As much uptime as e.g. Daml Hub might like to provide, if there’s network segmentation at the cloud provider level, the only way to ensure proper behavior is baking in this kind of resiliency.