Scalability and Replication of DAML Production system on AWS

Team,

I read the standard documentation on this topic on Daml Ledger Topologies — Daml SDK 1.15.0 documentation

This triggered some basic questions …

  1. What would be a typical TPS for running a DAML environment on AWS (mid-tier) server ? Obviously this is dependent on transaction size and daml complexity, but what would be a reasonable expectation ?

  2. What would be a basic topology for scalability and replication (AWS Cloud with VPCs and Postgresql) ?
    - Would it be possible or advisable to run 2 DAML Environments on their own VPC on different VMs?
    - Is only 1 postgresql database allowed, or can this be scaled as well? I assume it be replicated by a cloud service provider to guarantee data replication

2 Likes

Here is a simple diagram
[Flowchart Maker & Online Diagram Software]

2 Likes

A “Daml environment” will vary quite a bit depending on which driver you’re using. If we’re talking Daml Driver for PostgreSQL, there is no support for distribution at the moment so your only scaling option for the database and the driver is to run them on bigger machines.

You can run as many JSON API servers in parallel as you want, though if you run them with a PostgreSQL backend you need to take care of giving them different schemas or they’ll step on each other. There is a current stream of work that would allow for table prefixes in the JSON API; once that is done you’ll be able to point multiple JSON APIs to the same PostgreSQL schema, assuming you give each its own prefix. I don’t know if that will be included in 1.16.0 already, but it should be coming “soon” either way.

I am less familiar with other Daml drivers and Connect components so will let others chime in.

3 Likes

Thanks for the update @Gary_Verhaegen

Are there any benchmarks available for Performance on the Postgresql connector ?

1 Like

@anthony Any Thougths ?

1 Like

I’ve discussed with more informed people internally and the current situation seems to be that we do not have publishable numbers. The reason for that is that there is a lot of variance in possible setups and we just haven’t put in the work of creating reliable, generally-relevant benchmarks yet. It’s also not clear whether that’s even possible.

The best recommendation I can give you, unfortunately, is to build your own benchmark based on Daml code that vaguely outlines the shape of the solution you have in mind.

Regarding your flowchart specifically, I should point out that the database behind a Daml Driver for PostgreSQL (as well as the databases for Connect components) is considered an implementation detail and we do not support any kind of direct access. Therefore, it’s hard for me to understand what you might want to do with the second, duplicated instance of PostgreSQL. I would suggest giving each Daml system its own DB instead, if you’re going for two independent Daml systems and two database instances.

1 Like