Daml test failure - large scenario (sdk 2.8.x)

Hi,

I’m upgrading the code base of a project from daml-finance 1.3.0 / sdk 2.7.0 to daml-finance 1.3.11 / sdk 2.8.0.

After doing the migration I’m getting an error when I run a large scenario using daml test or daml studio which works in the non-upgraded version.

Error:

Scenario service backend error: BErrorFail StatusCancelled

Logs

Logs
$ daml test --files src/main/daml/Finance/Foo/Test/Demo/MultiCustodian/Step07.daml --debug

2024-03-08 08:47:04.71 [DEBUG]  [test ScenarioService]
Starting scenario service...

2024-03-08 08:47:05.88 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Server started.

2024-03-08 08:47:05.88 [DEBUG]  [test ScenarioService]
Scenario service backend running on port 64268

2024-03-08 08:47:06.01 [DEBUG]  [test]
Set files of interest to: [NormalizedFilePath "src/main/daml/Finance/Foo/Test/Demo/MultiCustodian/Step07.daml"]

2024-03-08 08:47:06.02 [DEBUG]  [test]
Starting shakeRun (aborting the previous one took 0.00s)

2024-03-08 08:47:06.02 [DEBUG]  [test]
Finishing shakeRun (took 0.00s, exception: AsyncCancelled)

2024-03-08 08:47:06.02 [DEBUG]  [test]
Starting shakeRun (aborting the previous one took 0.00s)

2024-03-08 08:47:06.26 [DEBUG]  [test]
Finishing shakeRun (took 0.25s, completed)

2024-03-08 08:47:06.26 [DEBUG]  [test]
Starting shakeRun (aborting the previous one took 0.00s)

2024-03-08 08:47:09.19 [DEBUG]  [test]
Finishing shakeRun (took 2.93s, completed)

2024-03-08 08:47:09.19 [DEBUG]  [test]
Starting shakeRun (aborting the previous one took 0.00s)

2024-03-08 08:47:17.42 [DEBUG]  [test]
2583427606478720320 semaphore reached

2024-03-08 08:47:17.42 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Connection started.

2024-03-08 08:47:17.42 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script started.

2024-03-08 08:47:17.93 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script finished.

2024-03-08 08:47:17.94 [DEBUG]  [test]
2583427606478720320 semaphore finished not cancelled

2024-03-08 08:47:17.95 [DEBUG]  [test]
6141213850222768817 semaphore reached

2024-03-08 08:47:17.95 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Connection started.

2024-03-08 08:47:17.95 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script started.

2024-03-08 08:47:18.11 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script finished.

2024-03-08 08:47:18.12 [DEBUG]  [test]
6141213850222768817 semaphore finished not cancelled

2024-03-08 08:47:18.12 [DEBUG]  [test]
1857336323753457577 semaphore reached

2024-03-08 08:47:18.12 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Connection started.

2024-03-08 08:47:18.12 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script started.

2024-03-08 08:47:18.30 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script finished.

2024-03-08 08:47:18.32 [DEBUG]  [test]
1857336323753457577 semaphore finished not cancelled

2024-03-08 08:47:18.32 [DEBUG]  [test]
8807773106013037112 semaphore reached

2024-03-08 08:47:18.32 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Connection started.

2024-03-08 08:47:18.32 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script started.

2024-03-08 08:47:18.46 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script finished.

2024-03-08 08:47:18.48 [DEBUG]  [test]
8807773106013037112 semaphore finished not cancelled

2024-03-08 08:47:18.49 [DEBUG]  [test]
1369846223231711097 semaphore reached

2024-03-08 08:47:18.49 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Connection started.

2024-03-08 08:47:18.49 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script started.

2024-03-08 08:47:18.70 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script finished.

2024-03-08 08:47:18.75 [DEBUG]  [test]
1369846223231711097 semaphore finished not cancelled

2024-03-08 08:47:18.75 [DEBUG]  [test]
5672921514192434007 semaphore reached

2024-03-08 08:47:18.75 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Connection started.

2024-03-08 08:47:18.75 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script started.

2024-03-08 08:47:19.09 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script finished.

2024-03-08 08:47:19.14 [DEBUG]  [test]
5672921514192434007 semaphore finished not cancelled

2024-03-08 08:47:19.15 [DEBUG]  [test]
462593573830893787 semaphore reached

2024-03-08 08:47:19.15 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Connection started.

2024-03-08 08:47:19.15 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script started.

2024-03-08 08:47:19.78 [DEBUG]  [test ScenarioService]
SCENARIO SERVICE STDOUT: Script finished.
File:     src/main/daml/Finance/Foo/Test/Demo/MultiCustodian/Step07.daml
Hidden:   no
Range:    29:1-29:12
Source:   Script
Severity: DsError
Message:
"Scenario service backend error: BErrorFail StatusCancelled"

2024-03-08 08:47:19.79 [DEBUG]  [test]
462593573830893787 semaphore finished not cancelled

2024-03-08 08:47:19.80 [DEBUG]  [test]

Finishing shakeRun (took 10.60s, completed)Test Summary


src/main/daml/Finance/Foo/Test/Demo/MultiCustodian/Step07.daml:run7: failed
src/main/daml/Finance/Foo/Test/Demo/MultiCustodian/Step06.daml:run6: ok, 162 active contracts, 137 transactions.
src/main/daml/Finance/Foo/Test/Demo/MultiCustodian/Step05.daml:run5: ok, 196 active contracts, 123 transactions.
src/main/daml/Finance/Foo/Test/Demo/MultiCustodian/Step04.daml:run4: ok, 130 active contracts, 114 transactions.
src/main/daml/Finance/Foo/Test/Demo/MultiCustodian/Step03.daml:run3: ok, 124 active contracts, 111 transactions.
src/main/daml/Finance/Foo/Test/Demo/MultiCustodian/Step02.daml:run2: ok, 100 active contracts, 101 transactions.
src/main/daml/Finance/Foo/Test/Demo/MultiCustodian/Step01.daml:setup: ok, 88 active contracts, 90 transactions.
Modules internal to this package:
- Internal templates
  23 defined
  20 ( 87.0%) created
- Internal template choices
  27 defined
  2 (  7.4%) exercised
- Internal interface implementations
  41 defined
    41 internal interfaces
    0 external interfaces
- Internal interface choices
  0 defined
  0 (100.0%) exercised
Modules external to this package:
- External templates
  86 defined
  18 ( 20.9%) created in any tests
  18 ( 20.9%) created in internal tests
  0 (  0.0%) created in external tests
- External template choices
  103 defined
  5 (  4.9%) exercised in any tests
  5 (  4.9%) exercised in internal tests
  0 (  0.0%) exercised in external tests
- External interface implementations
  176 defined
- External interface choices
  0 defined
  0 (100.0%) exercised in any tests
  0 (100.0%) exercised in internal tests
  0 (100.0%) exercised in external tests

Note that:

  • If I upgrade to daml sdk 2.8.3 (latest), it fails as well.
  • If I remove arbitrarily a couple of submissions, it works. (it is not a workflow issue)
  • If I run the scenario using daml finance 1.3.0 / daml sdk 2.7.0, it works.
  • If I run the scenario on Canton Sandbox, it works.

Thanks!
Jose

Hi Jose,

It looks like a timeout issue with the scenario service after upgrading your project. The StatusCancelled error typically indicates that the request was not completed within the expected 70 seconds time-frame, it’s possible that the request became too heavy to process after the upgrade of one of the component.

According the logs you provide, I guess the request in the IDE before upgrade and using canton take less than 30s. Can you confirm?

It would be very helpful if you could test by upgrading only one of the two components.

Cheers,

Remy

After investigating a bit more, I think the issue the output transaction is too big for the scenario service.
I was able to reproducing the issue with a simple program.
I created an issue.

1 Like

Hi Remy,

Timing (sdk 2.8.0 + daml-finance 1.3.11)

  • Scenario on Sandbox: 55 seconds (successful)
  • Scenario on daml test: 14 seconds (failure)

I would say it is not a timeout error.

Best
Jose

@Remy

By adding the following to daml.yaml, the scenario runs without errors.

script-service:
  grpc-max-message-size: 15000000
  grpc-timeout: 120
  jvm-options: []

Best
Jose

That makes sens, the response from the back-end is probably too large for the default GRPC message size.

I keep the issue opened, as we still need to be more explicit about the underlying issue.

Thanks for letting know you find a workaround.

1 Like