Daml-on-fabric, deadline exceeded during daml deploy

I’m using daml-on-fabric with version 1.2 of the SDK. I’m starting the server with

sbt "run --port 6865 --role provision,time,ledger" -J-DfabricConfigFile=/app/build/config.yaml -Xss2M -XX:MaxMetaspaceSize=1024M

and it seems to come up fine. I then try

daml deploy --host localhost --port 6865

and things initially look promising with

Deploying to localhost:6865
Checking party allocation at localhost:6865
Allocated 'hauler' for 'hauler' at localhost:6865
Allocated 'pumper' for 'pumper' at localhost:6865
Allocated 'pricer' for 'pricer' at localhost:6865
Allocated 'bisonManager' for 'bisonManager' at localhost:6865
Allocated 'opManager' for 'opManager' at localhost:6865`

but I then get a failure trying to upload the dar:

daml-helper: GRPCIOBadStatusCode StatusDeadlineExceeded (StatusDetails {unStatusDetails = "Deadline Exceeded"})

On the server side I see:

Invoke chaincode - going to call: RawBatchWrite on the chaincode daml_on_fabric
Received 1 tx proposal responses. Successful+verified: 1 . Failed: 0  - Fcn: RawBatchWrite
15:27:03.530 ERROR c.d.p.a.s.a.ApiPackageManagementService - Unhandled internal error
java.util.concurrent.TimeoutException: The stream has not been completed in 30 seconds.
	at akka.stream.impl.Timers$Completion$$anon$2.onTimer(Timers.scala:83)
	at akka.stream.stage.TimerGraphStageLogic.onInternalTimer(GraphStage.scala:1601)
	at akka.stream.stage.TimerGraphStageLogic.$anonfun$getTimerAsyncCallback$1(GraphStage.scala:1590)
	at akka.stream.stage.TimerGraphStageLogic.$anonfun$getTimerAsyncCallback$1$adapted(GraphStage.scala:1590)
	at akka.stream.stage.TimerGraphStageLogic$$Lambda$6253/0000000028036150.apply(Unknown Source)
	at akka.stream.impl.fusing.GraphInterpreter.runAsyncInput(GraphInterpreter.scala:466)
	at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:497)
	at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:599)
	at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:768)
	at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:783)
	at akka.actor.Actor.aroundReceive(Actor.scala:533)
	at akka.actor.Actor.aroundReceive$(Actor.scala:531)
	at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:690)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:573)
	at akka.actor.ActorCell.invoke(ActorCell.scala:543)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:269)
	at akka.dispatch.Mailbox.run(Mailbox.scala:230)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:242)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)`

Any ideas on how I can avoid this issue? I think I saw that version 1.4 of the SDK added a timeout option to the deploy and ledger commands but I think the max SDK supported by daml-on-fabric is 1.3, correct?

Thanks for your time!

2 Likes

Hi @mcook-bison and welcome to the DAML community!

It looks like you are hitting an internal timeout because of slowness running locally on your machine - this doesn’t look to be particular to a specific SDK version. Have you tried dedicating at least 4 cpu and 8gb memory to docker desktop in Preferences --> Resources?

Also docker is quite resource hungry so it is recommended that you only run minimal things on your machine while you are running daml-on-fabric to avoid contention and delay

Hope this helps

3 Likes

Thank you sormeter, but no luck so far. I’ve increased resources to 5cpu and 12gb ram but not seeing any change. What may complicate the issue is that this is all running in kubernetes… I’ve created a docker image with all the daml stuff on it, and that image is being used to create a pod with a guaranteed QoS of 5cpu and 12gb ram. I’ve then tried doing the daml deploy from both within that pod and within others. In all cases it seems clear that it’s making a good connection–both sides show activity and I get those initial allocations, but I can’t seem to get the dar step to complete.

I’ll bump up the resources again and give it another go. Does the kubernetes aspect lead you down any other troubleshooting paths?

1 Like

Hi…FWIW we use Kubernetes internally on project:DABL (a hosted offering for DAML ledgers) and haven’t hit Kubernetes-specific problems with respect to file uploading.

Out of curiosity, how large is your DAR?

2 Likes

Thanks dtanabe. The dar appears to be only 332k.

1 Like

Fabric is known to be fairly slow here regardless of the DAR size. I think just bumping resource limits is unfortunately your best option for now. We are looking into finding a better solution for this.

2 Likes

@mcook-bison welcome to the DAML Community!

Is deploying to Hyperledger Fabric is a strict requirement for this project? We’re currently working on a newer version of DAML for Fabric which should circumvent these kinds of problems, in the meantime you may want to try using any of the other ledgers that have a DAML integration.

1 Like

Thanks Shaul, unfortunately Hyperledger Fabric is a strict requirement for me right now. I’ve tried bumping resources all the way to 12 cpus and 45G of ram so I’ve got to think it’s something other than requirements that’s tripping me up. I’d definitely appreciate any other troubleshooting avenues you could send me down. Is there a way to run the server in debug mode or something?

1 Like

@sormeter any ideas?

@mcook-bison depending on the timeline you need to go into production, take a look at my response on another thread here

1 Like

Hey all, we tried a few things on our end and while we haven’t been successful in solving our problem we did learn some things that we wanted to share.

We grabbed the upgrade-to-1.4.0 branch of daml-on-fabric and made a few changes to get it running. That let us try 1.4.0 and that timeout command. What we realized was that if we set the timeout lower than 30 seconds we would fail with our new lower timeout. If we set it over 30, however, we still got the same server side error at 30 seconds and that failure still got sent back to the daml deploy command. It appears that our

java.util.concurrent.TimeoutException: The stream has not been completed in 30 seconds.
at akka.stream.impl.Timers$Completion$$anon$2.onTimer(Timers.scala:83)
at akka.stream.stage.TimerGraphStageLogic.onInternalTimer(GraphStage.scala:1601)
at akka.stream.stage.TimerGraphStageLogic.$anonfun$getTimerAsyncCallback$1(GraphStage.scala:1590)
at akka.stream.stage.TimerGraphStageLogic.$anonfun$getTimerAsyncCallback$1$adapted(GraphStage.scala:1590)
at akka.stream.stage.TimerGraphStageLogic$$Lambda$6169/000000000400CD40.apply(Unknown Source)
at akka.stream.impl.fusing.GraphInterpreter.runAsyncInput(GraphInterpreter.scala:466)
at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:497)
at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:599)
at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:768)
at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:783)
at akka.actor.Actor.aroundReceive(Actor.scala:533)
at akka.actor.Actor.aroundReceive$(Actor.scala:531)
at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:690)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:573)
at akka.actor.ActorCell.invoke(ActorCell.scala:543)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:269)
at akka.dispatch.Mailbox.run(Mailbox.scala:230)
at akka.dispatch.Mailbox.exec(Mailbox.scala:242)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

error is not related to that timeout. Perhaps it’s something else happening with the daml server unable to communicate something with the ledger? Please let us know if you have any thoughts.

1 Like

Oh, another thing to mention… I watched htop while doing the daml deploy… cpu use increased slightly for the first 10 seconds or so, but nowhere near the limits of the pod. Then it settled back down to normal levels and stayed there until and past the timeout.
(both the sbt run and the daml deploy were on the same pod)

1 Like

Do you mean the --timeout flag in daml deploy? That’s a client-side timeout. Currently, most (maybe even all) ledgers have a 30s timeout. We’ve recently merged a change that should increase that timeout and will land in SDK 1.6 which will be out in November. However, I don’t know yet when Fabric will upgrade to that release.

1 Like

Ah gotcha, yes we tried the --timeout flag in daml deploy.

1 Like

Hey @mcook-bison one other thing you could try is to start daml-on-fabric by separating the provisioning and the ledger instance as we do in Step 6 of our live DAML deployment scenario.

sbt "run --port 6865 --role provision"
sbt "run --port 6865 --role time,ledger" -J-DfabricConfigFile=config-local.yaml -Xss2M -XX:MaxMetaspaceSize=1024M

We also use sbt 1.2.8, if you haven’t used that specific version may be worth giving a try.

Thanks Anthony, those were good ideas. Unfortunately we tried them both and saw no change. Please let us know if you have any other thoughts.

Hey @mcook-bison 1.7.0 was released yesterday and has a better default timeout of 2 minutes (thanks to @gerolf), as well as configurability on the Ledger API via ParticipantConfig.

This may help solve your problem, you can install it with daml install 1.7.0 and then bump the sdk-version of your project to 1.7.0 on your deployed project.

daml-on-fabric was also updated to 1.7.0 so it’s now compatible.

Details here: https://github.com/digital-asset/daml/pull/7602