I often get the question whether to put this data or that data on the Ledger or not so here are some thoughts on that matter. Please discuss and add your own thoughts.
Some reasons not to store blobs on ledger
I would not recommend storing large blobs of data on the ledger because
- there is no binary data type (this is to some degree to discourage doing this in the first place)
- every time the contract storing the data is used (e.g. fetched, etc), the data is loaded and sent around, even is the actual attachment is not needed. This generates a lot of network load.
- The attachment is put out through the APIs wherever the containing contract is put out. This generates a lot of load on API Server and consumer.
State vs Assets
In general terms, I’d make the split between application state and assets. State is any information which changes the behaviour of the application. I would consider a tag on a Tweet part of state, because it determines where in the application the tweet appears. Similarly, I would consider the search index for Google’s picture search part of the application state. It determines which images appear in which searchers.
On the flip side, I would not consider the text of a Tweet part of the state. The text could be different or completely missing and the only thing that would change is that text on the end user’s screen. Similarly, the exact image is not part of the state of an image search. If an image of a cat got switched with one of a dog without being reanalysed or reindexed, the only thing that changes is the image being shown on the final consumer’s screen.
With DLT applications, I would keep the on-/off-ledger split close to the state/asset split. However, for convenience and lower complexity, it does make sense to store small assets like labels or short paragraphs of text on ledger. Similarly, for a simple chat app, it makes sense to store the text content of the messages on ledger.
How about cases where there are large assets?
A simple design is to just store them in am object store and reference them by hash.
template Attachment
with
owner : Party
receivers : [Party]
url : Text
hash : Text
where
signatory owner
observer receivers
The sender of a file simply sends a link and a hash that can be used to verify that the correct file was downloaded. Security could be improved by encrypting the file and adding the decryption key to the template meaning only parties that see the Attachment
can access the file.
What if I need strong availability or integrity guarantees?
There are some weaknesses to this, however, The owner
can take the file down, or even create Attachment
contacts for which there is no file at all. DAML’s value prop is to give a common view with strong integrity and consistency guarantees. Indeed, observer
s are guaranteed to see the same data as the signatories. When sending a blob on ledger, the receiver can not claim not to have received the data, nor does the sender have a way to “take it away” again.
Availability can be solved with aggressive mirroring. Ie every receiver of a file immediately downloads it, verifies it by calculating the hash, and then mirrors it for all other receivers.
Non-repudiation can be achieved by separating the download and decryption steps. Ie the receiver of the file only requests the decryption key once they have downloaded the file and verified the hash. By requesting the key, they confirm receipt. In case of conflict, the sender can prove correctness of file and key by decrypting a file with the given hash with the given key. Faking this would require a hash collision.