Passive replica error when pruning sequencer

Hello, we are testing with HA canton configuration using the e04-high-availability example with canton 2.8.3
As indicated from the documentation here sequencer services run in active-active mode.

However when we run pruning to the sequencers, one of the sequencer will return below error

ERROR c.d.c.c.EnterpriseConsoleEnvironment - Request failed for sequencer_a. Is the server running? Did you configure the server address as 0.0.0.0? Are you using the right TLS settings? (details logged as DEBUG)
  GrpcServiceUnavailable: UNAVAILABLE/Command clear_schedule sent to passive replica: cannot modify the pruning schedule. Try to submit the command to another replica.
  Request: ClearScheduleCommand()
  Trailers: Metadata(content-type=application/grpc)
  Command PruningSchedulerAdministration.clear_schedule invoked from cmd17.sc:1
com.digitalasset.canton.console.CommandFailure: Command execution failed.

It seems to indicate that one of the sequencer is a passive replica. But aren’t sequencer services running in active-active mode? Does that imply that we are not in active-active mode somehow?

In e04-high-availability example, we have 2 sequencers: sequencer_a and sequencer_b.
During local testing, when we are trying to set pruning schedule to sequencer_a above error will be returned.

However, for sequencer_b, there’s no issue setting pruning schedule.

The health status for both sequencers seems to show that they are active

@ sequencer_a.health.status
res16: com.digitalasset.canton.health.admin.data.NodeStatus[sequencer_a.Status] = Sequencer id: domain_manager_a::12203a51788d2f249eb44dec51bbd8d8e30cdb9d7d145512cb8cbcc30aa0349604bf
Domain id: domain_manager_a::12203a51788d2f249eb44dec51bbd8d8e30cdb9d7d145512cb8cbcc30aa0349604bf
Uptime: 4h 55m 48.141367s
Ports:
	public: 3010
	admin: 3011
Connected Participants:
	PAR::participant_a::122008c43b24...
Sequencer: SequencerHealthStatus(active = true)
details-extra: None
Components:
	db-storage : Ok()
	sequencer : Ok()
@ sequencer_b.health.status
res18: com.digitalasset.canton.health.admin.data.NodeStatus[sequencer_b.Status] = Sequencer id: domain_manager_a::12203a51788d2f249eb44dec51bbd8d8e30cdb9d7d145512cb8cbcc30aa0349604bf
Domain id: domain_manager_a::12203a51788d2f249eb44dec51bbd8d8e30cdb9d7d145512cb8cbcc30aa0349604bf
Uptime: 5h 10m 26.349092s
Ports:
	public: 3020
	admin: 3021
Connected Participants: None
Sequencer: SequencerHealthStatus(active = true)
details-extra: None
Components:
	db-storage : Ok()
	sequencer : Ok()

This will also hinder us from setting pruning schedule by using command directly to set pruning schedule to all nodes

pruning.set_schedule(.....)

=> return the same error saying sequencer_a is a passive_replica

Wonder if anyone can help to demystify above error and if there’s a solution on how to set pruning schedule using pruning.set_schedule when we have multiple sequencers for scaling and HA.

Thanks so much in advance!
Judy

Thank you Judy.

In addition to the active-active mode, each HA sequencer also maintains an “active-passive” mode for administrative operations, but that status is not yet exposed. Therefore finding the active sequencer for administrative commands is currently a trial and error exercise and is in fact why the error asks to “Try to submit the command to another replica.” The only work-around for this at the moment is to invoke set_schedule on all sequencer instances until one of them does not return an error.

We are fixing this in the upcoming 2.8.6 release by exposing a “sequencer admin” status that will identify which HA sequencer accepts admin changes such as pruning. As part of this pruning.set_schedule is going to invoke only the admin sequencer and avoid producing errors.

1 Like