Given a desired access pattern there are many ways to design a template, specifically the signatorys observers and controllers, to represent an access or permission model. In particular with Multi-Party Submissions we have more choices than before. Given two equally suitable designs, we could use performance to decide which is better. So, broadly, how should we think about the number of signatorys, observers and controllers when writing our contracts?
For example, is fetching a contract with 3 signatories 3x slower than one with only 1 ? What about when we exercise a choice, is that just a linear cost based on all of the stakeholders of a contract (with choice observers)?
That is a really good question. I was thinking about performance also after I realised that due to the Privacy model enforced by Daml, that records are not appended, they are archived and regenerated each time there is a change.
On their own, a simple contract would be almost incalculable, loadwise, from the perspective of Cloud & metal-Server CPU architecture. Has Digital Asset done any benchmarking on the performance of executing Daml smart contracts at scale?
Every signatory, observer or controller is computable from the template arguments and every choice observer must be computable from the same and the choice argument. So this naturally means that every additional such party must be somehow included in those arguments. So the size of a stored contract or Exercise event grows linearly with the number of these parties. So more data must be written to and read from the database of the participant node and the underlying Daml ledger. This certainly has a performance cost, but I do not expect the amount of data being the dominating factor for persistence unless the number of involved parties of a contract gets really large. But I’m not aware of any systematic measurements on that front; there are just too many variables.
In a distributed setup, the message complexity may be the bigger problem. By the ledger API guarantees, all signatories, observer, and controllers see the creation and the archival of a contract. So in the extreme case of them being hosted on one participant node each, the contract instance must be communicated between all the participant nodes involved. Roughly speaking, this adds another linear factor, so the overall message complexity (=amount of data that must be shipped around) can become quadratic in the number of involved parties for a single contract, even though it remains linear at every node in the network.
@quidagis points out another important point: The more information is stored in a single contract, the more expensive are updates of individual fields of the contract. Conversely, splitting a large contract up into many individual ones also has a performance penalty if most of the data is typically needed anyway. Because this way, we’d replace a single fetch node of a big contract with several fetch nodes of smaller contracts. Each fetch adds a Fetch node to the Daml transaction and there is a linear relationship between the number of nodes and the costs, in terms of computation, storage, bandwidth, etc.
@Andreas_Lochbihler Thank you for that excellent dive in, I had not considered the mathematical vectors of complexity, was only thinking about CPU/MEM utilisation and TCP traffic.
Just trying to see how I can do this on a per-Single Board Computer (SBC) instance as a learning exercise.
Can you describe explicitly the two factors that make this quadratic? Is it the case that every stakeholders node needs to acknowledge receipt of the message to all the other stakeholders nodes?
What about from an execution perspective? In order for a contract to be created on the ledger is it also a linear cost in the number of signatories?
The quadratic message complexity refers to the total amount of data that needs to be shipped around. So if there are n stakeholders of a contract on n particpant nodes, then the contract instance itself requires Omega(n) bytes because it must contain all the stakeholder’s party names and this contract instance must be distributed to the n participant nodes, which requires itself at least n messages. So overall, you ship Omega(n*n) bytes around. This calculation is asymptotically correct for all the Daml ledgers that I’m currently aware of, although one could imagine more complicated protocols with lower trust assumptions. In the case @Leonid_Rozenberg sketched, if every stakeholder confirms to everyone that they have received the message, the overall message complexity would probably still be just quadratic as we’d have Omega(n*n) bytes for distributing the message and then O(n*n) bytes for the confirmations from everyone to everyone, assuming that each confirmation is only constant-size (e.g., by just including a cryptographic hash of the original message).
A contract with a single signatory and m observers can be created in a single Daml transaction. So such a contract doesn’t have additional overhead. However, if you have k signatories, then you need to get the authorization from those k signatories. If you start from a clean slate and they are hosted on different participants and you follow the usual onboarding one-by-one, you need k transactions. So that adds another linear factor overall. If k is large, you can think about a divide-and-conquer strategy and first create two contracts with k/2 contracts and then merge the authority in a single transaction. Overall you still need Omega(k) transactions, but there is only a single one with k parties involved, two with k/2 parties, four with k/4 and so on. This gives you a much smaller overhead than onboarding one signatory after the other, but requires more coordination.
Moreover, if you need to create many contracts with the same set of signatories, it might be worth to create one “service agreement contract” with them as signatories from which a single party can create any desired contract in a single transaction.
That’s an interesting thought. I can assume that the line between the need for a service agreement of (1…20) Individual Contracts for Same Signatories vs (21…100+) Individual Contracts for Same Signatories would be a tradeoff of contract function, potential message complexity and Security?
Or is the Daml stack that safe that Security (100+ signatories in one file) is a non-issue, and I need to move past current technology-based biases?
@quidagis I’m not sure I fully understand your security concerns. Can you elaborate your concerns?
There’s a privacy aspect here: If a contract has many signatories, then those signatories will know of each other. If that’s a problem, then you can’t put them on the same contract. Also, all signatories will learn about whenever you use such a service agreement. So if you have a service agreement with 100 signatories and want to use it to create a contract with only 10 of them as stakeholders, then that’s bad for privacy (all the 90 bystander signatories will see the created contract) and for performance (those 90 bystander signatories receive a transaction about the service agreement being used).
From a trust perspective, the contract with the 100 signatories needs to live on one Daml ledger where all those 100 signatories are present. So they all must have accepted this Daml ledger’s trust model, which varies from Daml ledger to Daml ledger. For example, a Daml ledger with a centralized committer (such as in Daml on the VMware blockchain), everyone trusts the centralized committer to perform all relevant checks and inform all involved participants. In contrast, a decentralized Daml implementation, e.g., using Canton, could be configured to request confirmation from all signatories of a contract whenever it is used. This removes the need for everyone trusting one entity, but creates availability risks because the service agreement contract is usable only if all signatories’ participants respond to such confirmation requests.