Restoring canton ledger from database dump

Hi there,
We have a canton enterprise 2.4.0 running on some server and we would like to replicate the full ledger separately, locally and run Canton on top of the ledger database copy.

We dumped our participant node postgres database with

pg_dump -d <dbname> -h <host> -U <username> > db_dump.sql

then restored locally with

psql -d <dbname> -U postgres -f db_dump.sql

However, running locally canton enterprise 2.4.0 presents the following error:

@ nodes.local.start
ERROR c.d.c.c.EnterpriseConsoleEnvironment - Command failed on 1 out of 1 instances: (exception on myParticipant: java.lang.IllegalStateException: CANNOT_VET_DUE_TO_MISSING_PACKAGES(11,0): Package vetting failed due to packages not existing on the local node; action=package-vetting, packages=86828b9843465f419db1ef8a8ee741d1eef645df02375ebf509cdc8c3ddd16cb, participant=myParticipant
        at com.digitalasset.canton.participant.admin.AdminWorkflowServices.$anonfun$handleDamlErrorDuringPackageLoading$1(AdminWorkflowServices.scala:146)
        at cats.syntax.EitherOps$.leftFlatMap$extension(either.scala:185)
        at com.digitalasset.canton.util.EitherTUtil$.$anonfun$leftSubflatMap$1(EitherTUtil.scala:66)
        at scala.concurrent.impl.Promise$Transformation.run(Promise.scala:467)
        at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1395)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)

  Command LocalInstancesExtensions$Impl.start invoked from cmd1.sc:1
com.digitalasset.canton.console.CommandFailure: Command execution failed.

Tracking down the missing package id, 86828b9843465f419db1ef8a8ee741d1eef645df02375ebf509cdc8c3ddd16cb refers to daml-prim-DA-Exception-GeneralError inside one of our project’s .dar file.
The project’s .dar files are made with SDK 2.3.2.

Did we miss something in making a copy of the remote database?

Update (Dec 7): canton open source version 2.3.2 seems to work fine. This is interesting because server runs 2.4.0, and that doesn’t have problem with starting on a non-empty database.

Update (Dec 16): canton enterprise edition 2.4.0 throws a new kind of error when evaluating nodes.local.start:

ERROR c.d.p.s.b.c.ParameterStorageBackendImpl - Found existing database with mismatching participantId: existing 'myledgerid::12204b2b3149badc7ac5408541c037c1481bacb125c2998239ea62e9e47d3296ab09', provided 'myledgerid::122049f17be9c1811fc111d572473c9868e27c6fc4af6bde89c5d82a0daea9ad418d', context: {participant: "myParticipant"}
ERROR c.d.p.i.RecoveringIndexer - Error while running indexer, restart scheduled after 10 seconds, context: {participant: "myParticipant"}
com.daml.platform.common.MismatchException$ParticipantId: The provided participant id does not match the existing one. Existing: "myledgerid::12204b2b3149badc7ac5408541c037c1481bacb125c2998239ea62e9e47d3296ab09", Provided: "myledgerid::122049f17be9c1811fc111d572473c9868e27c6fc4af6bde89c5d82a0daea9ad418d".
        at com.daml.platform.store.backend.common.ParameterStorageBackendImpl$.initializeParameters(ParameterStorageBackendImpl.scala:144)
        at com.daml.platform.indexer.parallel.InitializeParallelIngestion.$anonfun$apply$4(InitializeParallelIngestion.scala:50)
        at com.daml.platform.indexer.parallel.InitializeParallelIngestion.$anonfun$apply$4$adapted(InitializeParallelIngestion.scala:50)
        at com.daml.platform.store.dao.DataSourceConnectionProvider$$anon$2.$anonfun$runSQL$1(HikariJdbcConnectionProvider.scala:78)
        at com.daml.metrics.Timed$.$anonfun$value$1(Timed.scala:18)
        at com.codahale.metrics.Timer.time(Timer.java:118)
        at com.daml.metrics.Timed$.value(Timed.scala:18)
        at com.daml.platform.store.dao.DataSourceConnectionProvider$$anon$2.runSQL(HikariJdbcConnectionProvider.scala:78)
        at com.daml.platform.store.dao.DbDispatcherImpl.$anonfun$executeSql$2(DbDispatcher.scala:64)
        at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:678)
        at scala.concurrent.impl.Promise$Transformation.run(Promise.scala:467)
        at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

I might be a bit late here. Generally, the instructions for backup and restore are written up here: Persistence — Daml SDK 2.6.3 documentation

We do test these in CI and are data continuity tests are based on these operations.

Regarding the error you’ve observed. I think that something is not right with the db backups you performed and somehow the versions are mixed up. But without more information it’s hard to diagnose.

In particular, the later issue tells us that the ledger-api server store (internal component to canton) is not aligned with the canton sync service stores (internal component to canton).

1 Like