Load Large amount of data into DAML

Hello DAML Community, I have a question, I want to do a project where I will extract data from a csv and then load it to DAML, I see that DAML has integration with postgress and I have the doubt if it is a good idea to load all the data there (these data sets can be quite large/high frequency) or use DAML to make the rules and save the data somewhere else

1 Like

Hi David,
welcome to the DAML Community!
That is an excellent question with many possible answers. The quickest way to get you started is to use DAML Script and its ability to load data from a JSON file: You could convert your data from CSV to JSON and write a short DAML Script that takes this JSON file as input and creates a contract for each JSON object encoding a row from the original CSV file.
Does that help? Please don’t hesitate to ask if you have any further questions.
Cheers, Martin.

3 Likes

Depending on the nature of the CSV file, and how it is used, it may also be worth considering not storing the file itself on the ledger. If the file has to be considered “as a whole”, so to speak, and is quite large, it may make more sense to store the CSV file externally and only record, say, a URL and a hash on the ledger.

In that configuration, you could use the same underlying PostgreSQL process to store both the ledger data and this “off-ledger” binary store (using PostgreSQL’s support for BLOB data types). If you do go down that path, you should very much be using a separate “database namespace” (“schema” in the PostgreSQL nomenclature) for use by the ledger, and not touch that schema with anything else. I would only consider this if the CSV file a) really needs to be taken as a whole, and b) does not change very often, and only in big chunks at a time. If individual entries often change then you’re probably better off following Martin’s advice: turn the CSV file into JSON first, then import through DAML Script.

2 Likes

Hi @DavidBelinchon, may I ask what is the exact nature of this data?

Is your concern mainly about performance?

2 Likes

Thank you very much for the answers, the problem is that I don’t know how those csv’s are going to be, the headers may be different, but they want the data to be saved inside DAML, my question now would be that I can have a dynamic template where I can save different sets of data, or that’s is imposible in that case

1 Like

DAML is very much a static language, so dynamicity is somewhat limited by design. It’s a bit hard to give you further direction without a better understanding of what it is you are trying to achieve. You could represent arbitrary CSV files as:

[TextMap Text]

i.e. each CSV file could be represented as a list of maps, where each line corresponds to a map, which has textual keys and values. This is a flexible, but therefore pretty uninformative, structure, and you would probably end up having to do a lot of manual parsing and error handling in your DAML code. (Whereas if you do know the structure in advance you can express much more through the use of DAML types, and your code could end up being much cleaner.)

I think it’s probably worth taking a bit of a step back at this point and giving us a bit more context as to what it is you are trying to achieve, at a slightly higher level.

3 Likes