This question came up during a presentation to Hyperledger Sweden during their Tech Study Circle. Essentially Fabric can store some data in off-chain databases and commit hashes to the main chain allowing the detailed off-chain data to be deleted in the future.
What’s the current status of this type of feature in Daml and/or Canton?
And will it use native Fabric facilities or its own?
Canton itself does support pruning however this hasn’t yet been extended to our fabric integration. The fabric integration is currently only at alpha level so support may be added in the future - perhaps using private data collections or by regularly truncating all data once distributed with the upcoming fabric channel check pointing support (there’s no need to retain all data in a domain once it has been distributed to participants as they are maintaining their own view of the virtual daml ledger).
FWIW the payloads exchanged over the fabric integration are encrypted and cannot be decrypted by the domain operators so transaction data itself is private. However addressing information is visible (which participants are involved in a transaction).
A general note with regards to GDPR: it’s a nuanced topic with the regulatory guidance around particular technologies still being developed, and DLT application authors have to take time to understand the underlying technology to understand how to make their application GDPR-compliant. No DLT, whether vanilla Fabric, or Canton on Fabric, can do all of it for you. Nevertheless, your choice of the DLT can still make a huge difference on your ability to comply with the GDPR.
For example, drawing on the EU Parliament research report, hashes (even if peppered!) may remain personal data. Thus, while Fabric’s private data collections may help you achieve GDPR compliance, it doesn’t mean you’re automatically compliant just because use them. Similarly, we believe that the automatic encryption of all payloads that Canton uses (over any ledger) can help you achieve GDPR compliance, but it might need to be accompanied by other measures (e.g., regular key rotation).
Finally, do bear in mind that - at least AFAIK - none of these techniques (hashing, encryption, ZKPs or what have you) have been tested in courts yet. It might be specially interesting in the cases where other regulation requires you to store data that might be personal (e.g., for an audit log).
The mechanism you are referring to is private data collectons. They do indeed work exactly the way I suggest treating contract attachments in the article you referenced. Implementing the same functionality in Daml is pretty straightforward. I have an implementation sitting in a private GitHub repo from a two-day hackacthon a bit over a year ago.
If anyone wanted to to spend another couple of days to clean it up and open source it, I’d more more than happy to share the code.