What sort of guarantees are made by the ledger API (and underlying implementations) regarding access to historical ledger events?
Specifically:
Are historical ledger events generally available? (The gRPC API does have a begin offset that looks like it enables a caller to request events starting from a point in the past: Ledger API Reference — Daml SDK 1.14.0 documentation )
The HTTP-JSON API has an offset parameter to stream queries that looks intended for use to pick a stream up where it left off. How long should an offset be considered valid? (Can they be retained for hours/days/weeks/months and still be used?)
Is there a ledger API (or JSON API) mechanism for fetching the ACS as it was at a specific ledger offset?
Is there a retention period after which a ledger event might not available via the ledger API? (Can this be configured?)
At the top of this question, I’m deliberately vague about whether or not I am referring to the ledger API or a specific implementation. I’d be interested in thoughts regarding how much of the answer is driven by the ledger API specification and how much of it might vary from one implementation to another.
(I’m asking all of this in the contest of historical reporting. Imagine it’s July, and I’m interested in generating a report summarizing ledger activity that occurred in June. If I know begin and end for June, access to historical events from the ledger API would make it possible to stream just a view of what occurred in that month. )
The guarantees the JSON API and Ledger API give are that historical Ledger events are available from the last pruning offset, and that you can always get your hands on the current ledger state.
So to answer your questions one by one:
Historical ledger events are available from the last Pruning offset. If you never prune your ledger, they are available from the very beginning.
The JSON API should be thought of as a queryable cache of the Ledger API. So the only guarantee you have there is also that offsets after the last pruning offsets will work.
No, that functionality doesn’t exist at the moment, but it is under consideration for the Daml Ledger API. What would you use such a feature for?
This is configured exactly via pruning. If you prune every day with an offset 7 days back, you have a 7 day retention period. But note that as long as a contract is active, it cannot get pruned. So create events are pruned only after the created contract is archived.
You can do this without problems as long as you didn’t prune your ledger in the meantime with an offset after the beginning of June.
For the JSON API I think it’s worth adding that the streaming endpoints do not provide guarantees that they expose individual transactions. The right way to think about the streaming API is that it’s a more efficient way than repeatedly polling the ACS. The JSON API is free to batch transactions together.
the only guarantee is that after applying the updates in a streaming block you have a consistent ACS snapshot again at the respective offset.
Is there a deeper underlying design philosophy that underpins this choice?
My going-in assumption would’ve been that someone would pick an append-only, immutable data structure at least in part because they value the fact that historical data is preserved in a stable and verifiable form, with a history that can be reconstructed after the fact. In other words, there would be some intrinsic value in being able to get to the intermediate consistent states along the way from offsets A to B to C.
However, if I read this right, the JSON API might get you from A to C in a way that skips over the (potentially interesting) intermediate state B. Is the assumption that you just need to use gRPC to get to the data if you really care about B?
The JSON API is centered around providing access to the ACS not around providing history. The streaming endpoints are a convenience & performance optimization over repeated polling but they’re not intended to provide access to history.
For audit-like usecases, you are better off accessing the gRPC API.
No, that functionality doesn’t exist at the moment, but it is under consideration for the Daml Ledger API. What would you use such a feature for?
The more I think about it, the more I think it’s useful for the historical reporting scenario. The stream of events from June isn’t enough to be able to fully report on the activity during the month. (To my understanding, the actual data for a contract from May that was archived in June would not be included in the event stream for June). So the June report would have to know the ACS at the beginning of the month to be able to fully interpret everything that occurred during the month.
With a little planning, though, scenarios like monthly reports could currently be handled by saving off a ACS snapshot as part of a month-end close process. This could be done even without direct API support for as-of ACS snapshots, and potentially integrated with a job that issues ledger prune requests. Where API support for as-of ACS snapshots might come in handy is when more flexibility is needed.