There is no one size fits all answer for this question. You are asking for a non-existence check: “no subscriber exists”. That is always a tricky one in a distributed system. In Daml there is currently only one type of non-existence check: negative key lookups.
In all cases, proving non-existence contends with creations. So you do have to lock the newspaper before you change the price. Otherwise getting the price change through cleanly is a nightmare.
To see the possible tradeoffs that we can make let’s write down clear requirements:
- Non-existence: We can check within a transaction - ie with on-ledger guarantees - that no subscriber is active
- Open App: The space of possible subscribers open in the sense that you can’t efficiently enumerate it. Let’s say that means it’s >1000 as transactions with >1000 ledger events are problematic…
- Independent subscriptions: There is no off-ledger process that can coordinate subscription requests to manage contention.
- Parallelism: No matter how many parallel subscription requests there are, they do not contend with each other.
Let’s give these properties up one by one:
No Non-Existence
If we don’t need any guarantees within the transaction your Attempt #2 works. You are trusting the off-ledger integration here to correctly perform the checks.
Closed subscriber space
Imagine you can enumerate the S<1000 possible subscribers. Just put a contract key on your subscriber contracts and do S lookupByKeys to check they are either inactive or don’t have a subscription at all.
External Subscription Management
Let’s say I have only a known fixed set of external processes that can add subscriptions - eg by virtue of managing subscriptions through a request-accept pattern so that the actual subscription is always submitted by the newspaper issuer.
In that case the off ledger process can sequence/coordinate the parallel subscription requests. The easiest way to do this would be to pre-allocate a subscription slots, each with a sequential integer ID as key. Say you start with N and whenever you fill up t*N
for some threshold 0<t<1
you allocate another ceiling(i*N)
for some increment 0<i
. The slots are empty at this point
When the subscription requests come in, the Java process matches them with slots thus avoiding any contention. If there more requests than free slots, there is no contention, but a short latency until more slots are allocated.
Limitations to Parallelism
You can prove that with the other requirements in place, you cannot get full parallelism. Outline of the proof: Imagine subscription S is created at offset O. Let C_1 and TX_1 be the command for the correct non-existence transaction TX_1 at offset O-1. To have on-ledger guarantees would mean TX_1 needs to be accepted at O-1, but rejected at O+1. That means TX_1 either does a negative lookup on a key occupied by S, or it consumes something that the creation of S requires. But because we have an open universe of possible subscribers, the negative lookup cannot depend on the subscriber in S. And because we cannot rely on an off-ledger component for coordination, any artificial key (like a sequential integer), or any thing the creation of S would consume, would also have to be consumable by another subscription S’, whose creation would thus contend with S.
And that last part gives us the clue how to think about this: We cannot prevent there being an S’ that could content with S, but we can make it more or less likely by consuming (looking up) from a greater or smaller possible pool of things. In other words, we can shard the creation of subscriptions more or less.
A simple strategy is to define a number N of shards. For each N maintain a counter in a separate contract that you increment when you create a subscription in that shard and decrement when you remove it. To get a party’s shard, hash it. Eg sha256
modulo N. Now your non-existence check is N fetches and checking all counters are 0. To estimate your probability of collision, assume you generally have P commands in flight at a time. That means a given command overlaps with ~2P other commands. The probability of it going in the same shard is 2P/N. So if you want a 99% success rate on first try and have constant parallelism 10, you need 2000 shards, which is pretty expensive for the non-existence check.
You can make your sharding a bit more dynamic by building something like a binary search tree of all subscribers so that the chance of collisions decreases with the number of already existing subscribers.
But no matter your strategy, if your number of shards gets really large, you start having to decompose your non-existence proof. For example if you worked with the simple strategy and 10000 shards, I’d first run 10 transactions in parallel each checking no active subscriptions in 1000 shards each, and then one extra transactions to combine those results.