Avoid contract upgrades with a JSON text field

Do you foresee any problem with replacing the following template design…

template Asset with
    owner : Party
    others : [Party]
    id : Text
    amt : Decimal
    --        :
    --        :
    -- a hundred other fields
    -- (some nested data structures)
    -- that the client wants "on the blockchain"
    -- but are not needed for exercising choices
    --        :
    --        :
  where
    signatory owner
    observer others

… with the following design which stores most of the data in a JSON text field…

template Asset with
    owner : Party
    others : [Party]
    id : Text
    amt : Decimal
    doc : Text -- JSON containing lots of data
    docVersion : Text
  where
    signatory owner
    observer others

Note: none of the data in the doc field are required in choice logic.

The motivation is to lessen the frequency that we must do contract upgrades as changes are made to the data model.

My thoughts…

  1. I cannot foresee any technical problems. It’s basically the same amount of data. Perhaps we should “minify” the JSON. Am I wrong?
  2. This doesn’t avoid having to handle the evolution of the data models. It merely shifts the responsibility from the ledger to the consumer.
  3. Contract upgrades are inevitable on any real-world Canton project. Perhaps it is better to just embrace them as standard operating procedure, instead of avoiding them.
  4. Another option is described by @Andrae here. “unless the contents of the data are necessary to manage the control-flow in a choice you don’t need to store it on the contract.” I agree in principal. In practice that means that a project must integrate with another data distribution mechanism when Canton is so conveniently available, with distribution and authorization built-in. And Canton users often like the idea of having the data “on the blockchain.”

Any other comments? Words of wisdom?

One of our founders, who will remain anonymous (cc: @Shaul), pointed me at this related post.

1 Like

This sounds like a good design choice. It makes it explicit that the ledger is used to distribute opaque payloads and that distribution is governed by the fields that are not opaque.

This also has precedent in systems like k8s where every object can carry annotations in their metadata that are not interpreted by k8s directly, but only by the tools and clients interacting with k8s: Annotations | Kubernetes

Side-note: you can also consider using something like https://json-schema.org/ instead of or in addition to the doc version field. I’d expect the need for that to depend on how schema evolution for the JSON docs is managed in the project.

Distributing the schema can then again be done off-ledger (and e.g. referenced using hashes created from its canonical representation RFC 8785: JSON Canonicalization Scheme (JCS)) or on-ledger. The latter is particularly interesting if there’s a well-defined business workflow for introducing new doc schemas that is worth encoding to improve business efficiency.

1 Like

I’m not a fan of storing opaque blobs on the ledger. Store a reference (I like the idea of storing a hash of some canonical form as @Simon_Meier suggested). Blobs can’t partake in Daml’s authorization semantics or benefit from Daml’s type-safety, so why store them?

As far as contract migrations go, I suggest using interfaces. If, for example, you’re modeling some external data interchange standard, you can represent each version of the standard as interfaces and use the view type projections to avoid migrations wherever possible.

Sorry it has taken me so long to respond.

In principle there is nothing wrong with this, and for very small json fragments this should work fine. As the json blob gets larger, you will start to run into operational issues.

  1. At some point the JSON blob will have to pass over the ledger API, if it gets too large you will run into problems with the (configurable) maximum grpc message size.

  2. Every time you fetch or exercise a choice on this contract the entire contract has to be loaded from the DB into memory by the Daml interpreter. This happens both on the submitting participant node, and on every validating PN. With even a moderate sized JSON document this can consume a non-trivial % of your IO bandwidth (PN-network, DB cache, DB Disk, etc).

  3. Every time you update this contract you have to store a duplicate of this JSON blob in every stakeholder’s PN private contract store. This can rapidly consume a lot of disk space, in addition to the IO bandwidth costs.

As you are not using this blob in any Daml code, the only benefit to offset these costs is that you don’t have to deploy/maintain a basic key-value store and corresponding authentication access-service — as you say, that may well be enough benefit to justify the costs if these costs are small enough (ie. minimal contract churn, small-medium json blobs, etc).

For large JSON blobs you definitely want to bite the bullet and store them off-ledger. For medium sized blobs you can still store them on the ledger, but you will want them on a separate contract. So have template Asset and template AssetPayload as separate templates. This is one situation where you might consider storing the contract-id of the AssetPayload in the Asset contract — as these are really one contract just separated for operational reasons. I personally refer to this situation where you decompose a single logical contract into multiple templates for engineering reasons as a “contract constellation”.

If you do need a larger JSON blob then it can’t go on the ledger, but you can still avoid having to implement your own authorization/privacy framework for it. A simple web-service that takes a JWT-token and an asset contract-id, and returns the JSON payload from a key value store after using the JWT-token to exercise a “validatePayloadAccess” choice on the contract-id to obtain the retrieval key will allow you to leverage the full power of Daml authorization and privacy while still allowing you to store and manage arbitrary sized payloads. There is no reason why you couldn’t share and manage TB-sized data transfers via the Daml ledger using this technique.

2 Likes