What is the semantics of the ledger ID? The Ledger Identity Service documentation says that every ledger has a unique ID and applications can check with the ID whether they are connected to right ledger. I’d like to better understand the semantics of this ledger ID. Here are some questions that may help to clarify the picture:
If two applications of the same party connect to the same participant, will they receive the same ledger ID?
If two parties connect to the same participant, will they receive the same ledger ID?
Suppose two parties are hosted on two different participants and they run a joint workflow over the two participants. Will they receive the same ledger ID?
Suppose that the same DAML party runs two participants. Will they report different ledger IDs?
I understand that different ledger deployments may behave differently. So the answers to the questions may range from “yes always” to “no never” with various degrees of “maybe” in between. If possible, I’d be interested in understanding these maybe cases as well.
I do not have all answers, so I will try to answer only what I know:
If two applications of the same party connect to the same participant, will they receive the same ledger ID?
When two client applications connect to the same ledger participant process, both clients have to use the same Ledger ID.
The client/party has to know that ID ahead of time, and the JWT (authentication/authorization token) that was issued for the client should be allowed to connect to the ledger process started with that Ledger ID.
If two parties connect to the same participant, will they receive the same ledger ID?
They will not “receive” the ledger ID, they must know it before connecting to the ledger participant process. Sharing the Ledger ID is an offline process right now, and should be done as part of the ledger on-boarding.
So if the application must know the ledger ID in advance to connect, what is the Ledger Identity Service needed for then? Can I assume that it simply mirrors what the client has specified in the connection?
This still leaves open the question of what the ledger ID represents.
As far as I remember, the ledger-id was previously (three years ago) a plan to introduce “ledger virtualisation” as a method to expose different “segregated” ledgers on the same system. Two use cases I remember people claiming were testing (running segregated integration tests) and “low cost ledgers through virtualisation”. However, not everyone agreed with the latter idea as “segregation” was kinda the opposite of “composability”.
In that sense, dropping the ledger-id might be a good idea for ledger-api v2.0.
As @Ratko_Veprek already stated, the field ledger_id itself has gone through a few iterations of what it means. Ledger virtualization was one purpose. Another one was to have a safe guard ensuring that your application connects to the ledger it expects to connect to (classic production vs testing environment “confusion”).
At this moment, you can view it as something like participant_data_generation marker. As long as you receive the same ledger_id from the LedgerIdentityService, your application should continue to work with the participant, at least from the point of view of receiving consistent data. So before deploying an application find out the ledger_id upfront, start the application with this configured value, and whenever the ledger data changes in a fundamental, non-compatible way (e.g. after invoking a ledger reset via ResetService on Sandbox, or potentially also pruning), the applications requests will be rejected and the application operator knows that it should probably wipe all local state and restart the app.
Having said that, there are few open questions that we need to clarify/specify:
Who owns the ledger_id? The participant operator who can also trigger participant-level pruning? The operator of the ledger itself?
What are all the scenarios that trigger a ledger_id change to ensure applications continue to work as expected in the face of pruning and possibly other data changes?
Now, to your questions:
The ledger_id is independent of the requesting party or application.
I would say that applications should not expect to receive the same ledger id from two different participants, even if the same party is hosted on both.
The LedgerIdentityService returns the ledger_id configured for the participant and does not mirror what the client specifies.
As far as I see, the LedgerIdentityService doesn’t require you to specify the ledger_id? So (minus the authorization issues), the gRPC would not need to know the ledger_id in advance?
Should we offer a participant_id then instead? This seems like it would satisfy your remaining use case of distinguishing production from testing environments, and would have a clearer semantics. With ledger_id, things get fuzzy in Canton, as any two ledgers are part of a single logical ledger (hand waving compatibility issues away).
The participant_data_generation_id semantics of the ledger_id you mention has me wondering about the JWT token; it seems unnecessary to ask me to provide a different token if, say, the participant operator prunes the participant node.
As long as you receive the same ledger_id from the LedgerIdentityService , your application should continue to work with the participant, at least from the point of view of receiving consistent data.
At the end, nothing prevents me from restarting a participant node and re-use the ledger-id from a previous run. In that sense, the ledger-id gives you as much guarantees as you get from “localhost:1234”.
The only proper way to detect if it is still the same ledger is to replay already received messages and verify that they are still the same.
Sure, that’s the reason why sandbox by default auto generates a random ledger id. And sure, nobody can prevent you from resetting the participant and starting it with the same ledger id without restarting or resetting your applications. But then it’s your responsibility to debug it as well
It is more of a safety measure than anything else. Maybe it turns out not to be useful, but it’s clear that it needs some work, especially in the context of multi-domain ledgers.