Deduplication with batching

Let’s say a client sends requests one-by-one with a requestId on each one. I want to use the deduplication mechanism to auto-deduplicate requests with the same requestId.

If I were submitting these requests as Daml commands to the ledger:

1. One-by-one
I can use the requestId as the commandId, hence using Canton’s built-in deduplication mechanism to eliminate duplicate commands.

2. As a CommandsSubmission batch
The whole CommandsSubmission batch will have 1 commandId - how to eliminate duplicate requests in this case?

@bernhard Any ideas on this please? Given there isn’t an identifier to uniquely identify a single Command in a CommandsSubmission batch, as per this discussion.

Is there a way the ledger’s automatic deduplication mechanism could be manually implemented? One approach we were looking at was using a single contract that stores the requestIds (commandIds) that have been successfully executed over the last week (assuming we were looking for a 1 week deduplication period)? The issue with this of course would be the contract would contain a List or Set of a few million values - I’m guessing that would exceed some sort of max contract payload size limitations or gRPC message size limits?

Commands in a single submission batch either all succeed, or all fail. So needing deduplication on single commands cannot arise through “partial failures” of a submission.
You can allow a single submission to partially fail through exception handling. You’d be using commands that call a choice that contain a try/catch block. But in this case, your problem is no longer deduplicating commands, but deduplicating individual ledger actions - the thing you are trying to do in the try block.

Your best bet in general is to make your commands idempotent. That way you don’t need command deduplication at all. Say your request involves somewhere in the transaction the writing some contract of type Foo to ledger via a create.

template Foo
  with
    creator : Party
    payload: Text
  where
    signatory payload

template SomeOther
  ...
    nonconsuming choice Bar : ()
      controller baz
      do
        ...
        create Foo with ..

As written, calling Bar is not guaranteed to be idempotent. One way to make it idempotent is to add the requestId to Foo and create a contract key on it.

template Foo
  with
    creator : Party
    requestId : Text
    payload: Text
  where
    signatory payload
    key (creator, requestId) : (Party, Text)
    maintainer key._1


template SomeOther
  ...
    nonconsuming choice Bar : ()
      with
        ...
        requestId : Text
      controller baz
      do
        ...
        create Foo with ..

Now calling Bar with twice with the same parameters will only succeed once. But if you are prone to submitting the same command twice, the second time will fail with a contention error (DUPLICATE_KEY) and thus fail the whole batch. You are even better off putting a guard in place:

template Foo
  with
    creator : Party
    requestId : Text
    payload: Text
  where
    signatory payload
    key (creator, requestId) : (Party, Text)
    maintainer key._1

template SomeOther
  ...
    nonconsuming choice Bar : ()
      with
        ...
        requestId : Text
      controller baz
      do
        ...
        oFoo <- lookupByKey @Foo (creator, requestId)
        case oFoo of
          None -> create Foo with ..
          Some cid -> do
            debug "Ignoring requestId " <>  requestId <> ". Foo already exists."
            return cid

Now your command is idempotent and duplication is handled gracefully.

1 Like
  1. So we’re currently using DA.Validation - do you recommend using try-catch exception handling instead to be able to leverage rollbacks using the catch block?

  2. I assume there is absolutely no way to prevent a batch from failing if a single command has an error unrelated to business logic, such as a failed fetch. In such cases, is there any way to know which command failed in the batch? Since there is no commandId at the command level, only at the batch level. For cases where a command in a batch fails, the app needs to be able to resubmit the rest of the commands with the erroneous one filtered out. Without knowing which specific command in the batch failed, it is quite cumbersome and (more importantly) it will not meet our SLAs, to use a recovery process like the one suggested here. Even if all commands were successful in a batch, a client app often needs to know which event was the result of which command. Is there a reason the commandId is not part of the Event payload itself, allowing easier correlation between the two?

  3. Thank you for the great explanation and code samples on deduplication using contract keys, but the reason we didn’t want to go that way is because it will only work for active contracts, so we’d need to keep the contracts whose existence decides whether it’s a duplicate request or not, active, for the entirety of the deduplication period, which is atleast a week. This would mean an ACS that’s 2-4 times larger than it would’ve been otherwise, which would likely impact performance. We will likely try this out though to get actual numbers. I’m assuming you recommend against the approach of a single contract that stores a List/Set of requestIds for a week, because of its size and the need to search that list of millions of requestIds which will be too slow for latency sensitive (1.5-2s) workflows?

  1. Yes, if you want to be able to handle individual business errors, use try/catch, not DA.Validation on its own. The latter will abort the transaction
  2. You are correct. You cannot handle contention errors in a single submission.
  3. Having an ACS that’s 2-4 times larger than is strictly needed may have a performance impact, but I don’t expect that impact to be all that significant. I definitely recommend against the single contract tracking. You’ll have too much contention on that contract.
1 Like

Cheers, thank you @bernhard!