Hi, I post a question following the previous article.
Currently, we are before the commercial service was opened in the company, but a big crisis occurred in the project due to the results of the performance test a while ago.
To summarize the test results, even if testing under various conditions and scale-up tests on laptops and servers, tps does not rise more than 130 tps in daml-on-sql and the maximum is almost fixed. (The maximum performance of daml-on-sql itself is 130 tps, which is our teamās conclusion.)
Due to the performance issue, we have to report it to our boss, but I would like the DAML teamās official answer to the following five items.
I want to know if the result of our test based on PostgreSQL is the maximum speed, or the official maximum performance value based on DAMLās PostgreSQL.
To improve performance, I was told that there is an infrastructure impact in the previous answer. Please tell me in detail which infrastructure needs to be upgraded. (We are not seeing great effect using scale up and Ledger API.)
I answered in the previous post that a performance upgrade version will be released sometime, but now we are in a desperate situation where we have no choice but to wait for it, and we want to know the date of the proposal. And I want to know how much improvement will be made.
Can a commercial service contract solve this problem? If a solution is possible, let me know how.
Even if vmware, corda, and fabric are used as ledger db, I think it cannot be faster than PostgreSQL in the same domain, but I would like to know if there is any official performance test data for this.
We tested both Json-RPC and Ledger API, but Ledger API also doesnāt have a big impact on TPS.
Our project focuses on using Json-RPC rather than Ledger API due to security and operational issues.
Our service is highly dependent on daml code. So, there is no alternative other than daml, and we want to quickly find a solution to the current problem.
The test environment completely has no elements other than Json-RPC and daml-on-sql. (There is no BlockChain or user UI and various API connections)
Please be sure to give the above 5 individual answers.
We have measured a throughput of hundreds of megabytes of transactions every second using a very complex Daml model on a single server. There is no reason why 130 transactions per second would be the maximum, but this number alone does not give us enough information to make a meaningful analysis of where the problem may lie. The exact number can vary wildly depending on a variety of factors, including the physical resources assigned to the participant and the database, the topology of the components, the size of the transactions and the efficiency and complexity of the Daml model itself.
A few questions that may help understand a bit better:
What is the size distribution of the transactions generated by your tests? What is the throughput you are observing, measured in B/s? Have you observed a specific bottleneck?
Are the participant and the database (including its disk) physically co-located?
What are the resources (CPU, memory and disk) assigned to the participant and the database? Are those resources assigned exclusively to those processes or are those shared?
How have you assessed the efficiency of your Daml code? Have you used some abstraction that could have added execution cost to the evaluation of commands? Have you relied on laziness in parts of your program (note that Daml code is evaluated strictly)?
@yongtaek as @stefanobaghino-da indicates, there is definitely no hard limit. I will DM you to set up a call to help you diagnose the problem, get an idea of your requirements, discuss options to unblock you.
To add an answer to your 4. at least: A commercial version of Digital Assetās products might help you in several ways. If inefficiencies in the Daml code are to blame, the new Daml Profiler in the Enterprise Edition of Daml Connect would be the ideal tool to diagnose that. And of course the commercial versions of the Drivers (and Connect) include commercial support for issues like the one you are experiencing here.
What is the size distribution of the transactions generated by your tests? What is the throughput you are observing, measured in B/s? Have you observed a specific bottleneck? => I ran a test calling ā/v1/createā through the load test tool. If you call ā/v1/createā with curl on another laptop while the test is in progress, the result will be returned after 2~3 seconds.
Are the participant and the database (including its disk) physically co-located? => We tested both physically in the same space and in the same network location. (It shows the same result.)
What are the resources (CPU, memory and disk) assigned to the participant and the database? Are those resources assigned exclusively to those processes or are those shared? => Google Cloud server is used, and independent resources are allocated. Specifications are 8Core, 32G, 100G SSD.
How have you assessed the efficiency of your Daml code? Have you used some abstraction that could have added execution cost to the evaluation of commands? Have you relied on laziness in parts of your program (note that Daml code is evaluated strictly)? => Only the ā/v1/createā call was performed using Json-RPC.
When performing a load test using two laptops, each is evaluated as 60~65tps.