DAML Ledger API as a Message Queue?

Internally DABL heavily relies on Request contracts; a good example is the creation of a ledger:

template LedgerRequest
  with
    user       : Party
    operator   : Party
    ledgerName : Text
    projectId  : Text
  where
    signatory user

    controller user can
      LedgerRequestCancel : ()
        do return ()

    controller operator can
      LedgerRequestAccept : ContractId Ledger
        with
          acceptTime : Time
          ledgerId   : Text
        do
          create Ledger with
            createTime = acceptTime
            owner      = user
            ledgerData = LedgerData with
              metadata = empty
              ..
            ..

      LedgerRequestReject : ()
        do return ()
template Ledger
  with
    operator   : Party
    owner      : Party
    ledgerData : LedgerData
    createTime : Time
  where
    signatory operator, owner

    key (operator, ledgerData.ledgerId) : (Party, Text)
    maintainer key._1

    controller operator can
      LedgerUpdateMetadata : ContractId Ledger
        with
          newMetadata : [(Text, Text)]
        do
          create this with
            ledgerData = ledgerData with
              metadata = fromList newMetadata
      LedgerOperatorArchive : 
            ContractId ArchivedLedger
        with
          archiveTime : Time
        do
          create ArchivedLedger with ..

    controller owner can
      LedgerOwnerArchive : 
            ContractId ArchivedLedger
        with
          archiveTime : Time
        do
          create ArchivedLedger with ..

The UI issues a POST HTTP call to a web service, which in turn uses the gRPC Ledger API to create LedgerRequest with a party that corresponds to the user. A bot in a separate process sees the request, and calls either LedgerRequestAccept or LedgerRequestReject. The pattern works well when you stay on the happy path.

The Unhappy Path: Some failure modes that we are starting to encounter

Bot not running when a Request contract is created

Not all of our handlers consider both the ACS and transaction event stream as sources of Request contracts. There is no single gRPC Ledger API primitive for this stream; the HTTP JSON Websocket API does implement semantics that allow the client to abstract away whether a Request contract existed or not. However, that is not necessarily without its problems as well…

Poison pill problem

Bot receives a contract that confuses it enough to crash, hard.
If the bot ignores all requests that are issued when the bot is not running, then the bot recovers, but an unacknowledged request remains sitting on the ledger forever.
If the bot retroactively acts on requests that already exist in the ledger on startup, the bot runs the risk of crashing again on the same payload (poison pill), preventing it from ever starting again.
This can somewhat be mitigated by all clients following a pattern with affordances in the models to allow this:

def onContract(event) {
    if event.cdata.allowedRetryCount > 0 {
        try {
            process(event)
        } catch {
            decrementAllowedRetryCount(event.cid)
        }
    } else {
        killRequest(event.cid)
    }
}

However, a poison pill could still easily occur by exploiting mistakes in the models/bot that allow a bot visibility into a contract that it can’t actually act on, even though the bot is coded with the expectation that it can with respect to tracking the retry count as a field in a template.

Temporarily failure to process problem

If a poison pill does not crash the process but merely fails to process temporarily (we observe this with bots that make decisions based on the result of an external service call), there is no way, gRPC or HTTP JSON API, to “rewind” the tape and re-process failed requests. In all of DABL’s current systems, this results in requests remaining “stuck” until processes are restarted (and even then, that’s only if the ACS is considered). However, maniacal retries may also cause the process to be stuck endlessly retrying a request that is doomed to forever fail.

There isn’t a question that needs to be answered for this post; this is merely our current state thinking on the various pros/cons to employing different approaches to application building over the Ledger API, particularly when using it in a similar fashion to what message queues might traditionally be used for.

References
http://zguide.zeromq.org/php:chapter4
https://www.rabbitmq.com/dlx.html

7 Likes

Great post, this captures the whole problem in detail.

I’d like to add one other category of problem here for the unhappy path, but with more nuance: specifically the scenario where the Request is rejected for a valid reason - a non error case. In this case we’d want the reason for the rejection to be made available to the requester in a way that they could process and make sense of. For instance in the example above, assume that there is a quota of three ledgers per user, the fourth request for a ledger should be failed with a quota related reason. If it were failed, but the user was unaware of the quota level, how might they be made aware of the reason that their request was not completed? Is there a way to do this without additional template design to account for the visibility of the resultant rejection reason?

1 Like

I go back and forth on whether this is a good idea. On the one hand it would be nice to observe how contracts get archived; akin to a process exit status, via a simple primitive. On the other hand, if you do care about the end result, it makes sense to model it explicitly in DAML via templates. I think that the latter is probably the better approach because it keeps DAML simpler and consequently easier to deploy to different persistence layers.

1 Like

@dtanabe Could you give more examples of your poison pill contracts?

Hey @Max_DeLiso, we did something similar to this when building our 3rd party external API tool. We use tool-metagen n.b. ‘swagger’ branch, to convert Swagger JSON specifications into DAML RequestXXX / ResponseXXX contracts, and within the response we use a sum type to distinguish successful and unsuccessful replies.

The possible replies need to be specified in the input Swagger file, so the input JSON might look like this:

       "responses": {
          "200": {
            "description": "Returned if the request is successful.",
            "examples": {
              "application/json": "{\"key\":\"jira-software\",\"groups\":[\"jira-software-users\",\"jira-testers\"],\"name\":\"Jira Soft
            },
            "schema": {
              "$ref": "#/definitions/ApplicationRole"
            }
          },
          "401": {
            "description": "Returned if the authentication credentials are incorrect or missing."
          },
          "403": {
            "description": "Returned if the user is not an administrator."
          },
          "404": {
            "description": "Returned if the role is not found."
          }
        },

And the output DAML type will look like this:


template AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponse with
    requestor : Party
    requestId : ContractId AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGet
    body : AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponseBody
  where
    signatory requestor

data AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponseBody
    = AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponseBody_401 ()
    | AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponseBody_403 ()
    | AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponseBody_200 [ApplicationRole]
    deriving (Eq, Ord, Show)

You then get a type-safe way of checking the response at runtime.

You can learn more about this integration piece, ‘dagger’ here (internal to DA employees only).

1 Like