Canton DB migration timeout DEADLINE_EXCEEDED

Hi,

When starting up canton for the first time on an empty DB it usually runs the migration for the public and ledger_api schemas.

One or the other time, it takes a bit longer and when almost finishing the ledger_api schema migrations it renders an error complaining that the deadline of 53s was exceeded:

e[0;39me[34mINFO  o.f.c.i.c.DbMigrate - Migrating schema "ledger_api" to version "42 - Convert hash indices"
e[0;39me[34mINFO  o.f.c.i.c.DbMigrate - Migrating schema "ledger_api" to version "43 - explicit compression"
e[0;39me[34mINFO  o.f.c.i.c.DbMigrate - Migrating schema "ledger_api" to version "44 - offset as text"
e[0;39me[34mINFO  c.d.c.n.g.ApiRequestLogger:participant=participantIssuer tid:f26c9bb5b43388b9d57168c68fed13aa - Request c.d.c.i.a.v.InitializationService/InitId by /10.131.2.220:46390: cancelled
e[0;39me[1;31mERROR c.d.c.c.EnterpriseConsoleEnvironment - Request failed for participantIssuer.
  GrpcClientGaveUp: DEADLINE_EXCEEDED/deadline exceeded after 53.999618057s. [remote_addr=/0.0.0.0:5012]
  Request: InitId(di-execution,1220a6851a09e8fea4016f03caf33dc3aff2f3057614ae7c91e683852d18632e1649)
e[0;39me[1;31mERROR c.d.c.ServerRunner - Command execution failed.

e[0;39me[1;31mERROR c.d.c.ServerRunner - Unexpected error while running server:
e[0;39me[34mINFO  c.d.c.ServerRunner - Exception causing error is:
e[0;39mcom.digitalasset.canton.console.CommandFailure:

Can we somehow extend this timeout? looks like it’s a Grpc thing but the documentation I found online was limited.

Thanks in advance.

Matheus

1 Like

You can change timeouts either statically or at runtime.

Statically: Change the config parameter parameters.timeouts.console = 10m.

At runtime: Call console.set_command_timeout(10.minutes) to change the timeout.

If that does not work for you, please get back to me with the exact commands you are running.

3 Likes

Hi @MatthiasSchmalz ,

Thanks for the reply. I tried to change it statically but I got an error complaining that for parameters.timeouts.console it expects an object but found a string for 10m.

Then I went with the runtime approach, and added console.set_command_timeout(10.minutes) in the bootstrap script.

It seems to accept it, but I wonder since we’re talking about DB migrations if that is invoked before the migration, or if the migration runs before the participant is first started?

Hi @Matheus

Right, the config parameter should actually be:
parameters.timeouts.console.bounded = 10m

Sorry about the mixup!
The documentation of our config parameters can be found here:
https://www.canton.io/docs/dev/scaladoc/com/digitalasset/canton/config/CantonCommunityConfig.html

The DB migration is applied for each node individually and it is applied when the node is started. Note that nodes may be started automatically before the bootstrap script is applied. To avoid that, you need to:

  • Start Canton with --manual-start=true.
  • Start nodes in the bootstrap script. (E.g., nodes.local.start().)
  • Reconnect participants to domains in the bootstrap script. (Something like myParticipant.domains.reconnect().)
1 Like

Hi,

I encountered the same issue.
Is the solution to shut down the pNode and execute a recovery on that pNode to its state?

Cheers,
Jean-Paul

Hi @jaypeeda ,
I am not sure I understand why you want to execute a recovery and how it is related to the original question (since the original question mention an issue at startup).

In which context are you seeing the timeout?
Have you tried Matthias solution above (extending the timeouts and disabling auto-starting of the nodes)?

Best,

Rafael

Hi Raphael,

I created a new topic so it’s more relevant:

Cheers,

Jean-Paul