Verifying that the transaction stream is sending all transactions a Party is entitled to see?

Piyush_Bedi · July 21, 2020, 1:19pm

Is there any means for a ledger api client application to verify that the Participant hasn’t - perhaps accidentally - missed sending a transaction for the subscribed Party?

I’m often asked how a client application or participant node operator can verify that they have all the transactions that a party is entitled to.

stefanobaghino-da · July 21, 2020, 1:27pm

You should never test your upstream components. The question is akin to “how do I test that the database really sent me all records that match my where clause”.

If you think you spotted a bug, you should try to create a minimal repro and open a PR (or at least raise the issue on GitHub).

Leonid_Shlyapnikov · July 21, 2020, 6:36pm

I am thinking about @Piyush_Bedi’s question. I think a service similar to TransactionService but returning current ledger state checksum instead of (stream GetTransactionsResponse) and (stream GetTransactionTreesResponse) would be useful for clients.

E.g. if I have a long running TransactionTree stream on the client, I would want to have some guarantees, checks in place that assure me that I did not accidentally drop a transaction because of runtime exception or some other bug on my side.

If there were:

a documented deterministic algorithm to calculate transaction stream checksum and
a gRPC endpoint to request this checksum from the ledger

this would have provided a way to validate the current state on the client-side without re-subscribing from the ledger begin.

bernhard · July 22, 2020, 8:05am

There are three angles to this I see from the question and the comments so far:

1. Can you verify that you have a complete set of data for a party Alice?

Short answer: No. Imagine a topology where Alice is hosted on two nodes P1 and P2, and your ledger is running on an infrastructure in which a fork is possible. If you have a network partition separating P1 and P2 and there is a fork, P1 and P2 have no mutual knowledge of what’s going on on the other side and the term “complete set of data for a party Alice” loses meaning.

Less drastically, the Partitioned Ledger Topology allows for partitioned ledgers without forking. In that scenario it’s possible to have to participants P1 and P2 both hosting Alice, but part of different sets of partitions. A single participant does not have enough information to ascertain whether its data for Alice is complete. It can only ensure it’s complete for the partitions it’s a member of.

So: The best you can ask for is that you can verify completeness of Alice’s data on the partition/fork that a given participant is a member of.

2. Can I verify that a Participant is giving me a complete set of data for the partition/fork it belongs to?

The best you can hope for is that you can verify completeness up to a point as due to whatever delays or latencies you may not have caught up to the latest yet. But even a point is imprecise. DAML Ledgers are not linearly ordered, they only enforce causality ordering. There’s a PR open documenting this in great detail.

Furthermore, a single Party (and this a single Participant) only has partial knowledge of the Ledger. Eg if Alice and Bob alternately create Foo contracts which only the creator is a stakeholder on, Alice doesn’t have any knowledge of Bob’s activity and vice versa.
In the same vein, if Alice, Bob and Charlie keep moving an IOU issued by Doris around in a circle, each single party only sees disconnected inward and outward transfers. They can’t correlate the events. If you add the whole partitioning topic, you can imagine this sequence:

Alice on P1 sends to Bob on P2
Bob on P2 sends to Charlie on P3
Charlie on P3 sends to Alice on P4 (on a different partition)
Alice on P4 sends to Bob on P2
Bob on P2 sends to Alice on P1

What does Alice on P1 see? Just an outward transfer to Bob and an inward transfer from Bob. P1 has no information about the fact that any Iou passed though Alice on P4.

What the DAML Ledger model does allow you to verify is that given an event/action, you can verify that the subgraph that led up to that event/action is valid, which includes a degree of completeness. Ie Alice on P1 in the above can verify that because both Doris and Bob gave appropriate authority the outward and inward transfers are valid and there is no information missing that Alice is entitled to.

This kind of verification is enabled by the TransactionService in tree mode.

3. Can my client application check that it didn’t miss a transaction that the Participant does know?

The scenario here would be that a client application subscribes to the transaction service, and crashes a offset 7. It restarts, re-subscribes, gets transactions from offset 17 and misses information in between.

Rather than going for verification, the idea of the DAML Ledger API is that the client keeps track of the last offset it has seen. Ie after completing the processing of the transaction at offset 7, it should write 7 to it’s own persistence. If it now crashes, it knows to resubscribe from 7.

Clients without persistence need to be able to restart from current state. Ie they need to subscribe to the Active Contract Service first, which gives the offset at which that contract set was valid. Then the client subscribes from that offset.

The Ledger API gives the guarantee that the data returned in those usages is complete. As @stefanobaghino-da said, that’s something you need to trust.

Technically we could emit some hashes, to make it look fancy, but it wouldn’t add any guarantees. Eg imagine we kept a table which for each Party and offset kept a hash #(Party, Offset) in such a way that if for Alice offset 11 follows offset 7, the #(Alice, 11) is the hash of [#(Alice, 7), 11]. The client app could now “verify” that the hashes all match nicely, but since they are Participant generated, you are still just trusting the participant to do its job.

Leonid_Shlyapnikov · July 22, 2020, 2:47pm

Regarding the N3:

After the crash restart, how would you know for sure that your application managed to “persist” the last processed offset? What if offset persistence and transaction processing is not an atomic operation? You might end up processing all events up to offset 17, but crash on persisting offset 17 and end up with offset 7 persisted as the last seen offset.

I would want to be able to run the client-side integrity check after every client application crash, unless my client application designed that it guarantees atomic operations.

bernhard · July 22, 2020, 7:30pm

Yes, but is that something the Ledger API could actually help with in any way? I don’t see how.

Leonid_Shlyapnikov · July 22, 2020, 8:40pm

Ledger API could provide a service to simplify integrity checks on the client side. So client does not have to consume all transactions from the ledger begin when it is not 100% sure that the client state is consistent.

We should also document Ledger API best practices, explaining why it is important that ledger transaction processing and ledger offset update/persistence are handled as one atomic operation.

SamirTalwar · July 23, 2020, 12:32pm

I think this would be down to the user. In an ACID model, such as most SQL databases, one could typically expect the user to trust that their database wouldn’t COMMIT half a DB transaction, in which they processed a DAML transaction and then updated their offset.

In a looser model, you still tend to have certain kinds of atomic guarantees. For example, in many document databases (CouchDB, Elasticsearch, MongoDB, etc.), you can expect each document to be atomic. This means you can write the latest offset into each document that is changed, as well as writing to a “latest offset” document. If you do this, you can then check the offset in each document that would be manipulated and discard any transactions with an older offset.

Of course, this only works for mutations, not creations or deletions in the general case. You might be able to mitigate this by using the transaction ID as the document ID, meaning that duplicate creations will fail and duplicate deletions will no-op, but this really depends on what you’re storing and for which purpose.

At the end of the day, if you’re trusting to a non-atomic data store with eventual consistency, you’re always going to have issues with inconsistent data at some point. These issues usually need to be addressed based on your domain model; there’s no right answer.

Well, maybe the right answer is “use PostgreSQL”, but I don’t want to dictate that.

Topic		Replies	Views
Missed transactions on the Transaction Stream due to disconnection to Participant Node Questions ledger-api	4	189	February 27, 2023
Why is record time for transactions not exposed on the Ledger API Questions	10	355	June 29, 2020
Which token should i pass? Questions daml , damlhub	29	625	February 26, 2022
An approach to handling Participant Node failure Tutorials and Guides ledger-api	6	1033	November 16, 2022
Tracking offset from the Completion API Questions completion	7	149	December 16, 2021

Verifying that the transaction stream is sending all transactions a Party is entitled to see?

1. Can you verify that you have a complete set of data for a party Alice?

2. Can I verify that a Participant is giving me a complete set of data for the partition/fork it belongs to?

3. Can my client application check that it didn’t miss a transaction that the Participant does know?

Related topics