Can you manage and retrieve archived contracts in Daml when you need historical state?

Hi @drsk , thanks a lot for your guidance!

I have been going through DAML docs, especially Ledger API part, and I have some following questions that I would really appreciate your help!

As I understood, HTTP JSON API can only be used to fetch contracts that are active. While in my use case, I need to be able to keep all transactions so I can keep track of everything and query afterwards. For example, if a batch of raw material is used to produce product_A and product_B, while some time later, when all transactions are finished and all contracts are archived, I got informed by the customer who is using product_B that product_B has some defects, and I would need to trace back and find which batch of raw material was used, so eventually I can further notify the other customers, in this case, it’s the customer who is using product_A. Since in this case, all transactions are already finished and no contract is active, how should I keep track of all historical transactions? And by “join”, I mean, take the above for instance, I can find the batch numbers of raw material with defects, so I can join this information with a “table” that is like <batch_number, product_number, customer_number>, and so I can find out the customers that are being affected.

I find there is a developing function “extractor” which can be used to “dump” contracts and transactions to a PostgreSQL database. Is this the right way to go? If so, then we would need to run this extractor job every certain amount of time, is my assumption correct here?

In summary, my two questions are:

  1. How can I fetch/query historical transactions that are no longer active?
  2. To achieve the “join” goal, is there a nice way of doing in DAML? I can only think of dumping all transactions into a relational database and run sql queries against that.
    3*. I’m thinking of the dependency tree data model that you suggested, do you have any sample code that I can refer to?

Thanks a lot for helping!



@GuisongFu your use-case is one I’d describe as a classic “provenance” use-case, by which I mean that the ledger history is an essential part of your application state.

You have correctly identified that the JSON API is geared towards applications in which application state corresponds to ledger state, which is the set of active contracts. In fact, most of the Daml Connect stack is geared towards such applications. The JavaScript libraries and Daml Triggers also only give you access to ledger state.

The reason is that in practice ledger history requires careful management. The kinds of applications Daml is used for can generate thousands of transactions per second, generating vast amounts of data very quickly. It is not feasible to keep all ledger data around in a queryable datastore indefinitely.

At the moment, we require applications that do need queryable ledger history to maintain their own operational data store as you describe in 2. This is not difficult to do. The Extractor you referred to is a prototype tool as part of Daml connect that does this in a generic fashion.

In the future, we do want to better out-of-the-box support for these kinds of use-cases in two ways:

  1. Queryable ledger history back to the last pruning. Pruning here means ledger truncation. In the high-volume use-cases I described above, that may mean that you only have a few days of history available to you. The query capabilities will also be much more limited than a SQL database. You’ll likely be able to fetch historic contracts and transactions more easily, and follow “causality” in the ledger model, but we are not yet planning HTTP API-like query capabilities for historic data.
  2. We are going to revisit the Extractor topic with a slightly broader mindset. I’d like developers (like you in this case) to be able to specify “custom views” of the ledger, which are then maintained in an extractor-like component. A bit like materialized views for Daml Ledgers.

I hope this helps you realize your use-case in Daml. I think it’s a good one.

As for data models, I think the closest example we have, though it’s not a great match by any means, is the supply chain one: GitHub - digital-asset/ex-supply-chain: Reference DAML application demonstrating a supply chain use case.. Maybe it’s worth having a look at how that’s written.


Excellent question, moved to its own thread.