Canton - Domain Manager Health Check

Hi Team,
I am trying configure HTTP health check for HA domain manager setup(new 2.5.0 version). I configured it according to Release Notes:

        monitoring {
            health {
                server {
                    address = 0.0.0.0
                    port = 8000
                }
            check.type = is-active
            }
        }

But after start domain manager I got next errors:

IsActive health check must be configured with the participant name to check as there are many participants configured for this environment

Could you please advise what can be wrong? Would be great if you provide example for domain manager 2.5.0 health check configs.
Thanks.

You probably have a config like this:

canton
{
        monitoring {
            health {
                server {
                    address = 0.0.0.0
                    port = 8000
                }
            check.type = is-active
            }
        }
  participants {
    participant1 {
    ...
    }
    participant2 {
    ...
    }
  }
}

It’s complaining because having two nodes configured means you need two health endpoints configured. So try this

canton
{
  participants {
    participant1 {
    ...
           monitoring {
            health {
                server {
                    address = 0.0.0.0
                    port = 8000
                }
            check.type = is-active
            }
        }
    }
    participant2 {
    ...
        monitoring {
            health {
                server {
                    address = 0.0.0.0
                    port = 8001
                }
            check.type = is-active
            }
        }
    }
  }
}

@bernhard thanks for reply but you provided configs for participant. I asked about Domain Manager version 2.5.0+ where was implemented health check mechanism for enterprise version(according to release notes).
Thanks.

Oh, sorry, but it’s the same deal. Rather than having the monitoring block directly under canton, move it under the domains.mydomain block. If you could share more of your domain manager config, I could be more specific.

@bernhard
we have for example three nodes for domain-manager(1 active 2 passive) and each of them has next config:

canton {
    domain-managers {
        domainmanager {
            storage = ${_shared.storage}
            storage.config.properties.databaseName = "db"
            init.domain-parameters.unique-contract-keys = false
            replication.enabled = true
            admin-api {
                port = 4001
                address = 0.0.0.0
            }
        }
    }
    monitoring {
        health {
            server {
                address = 0.0.0.0
                port = 8001
            }
            check.type = is-active
        }
    }

And you said that need move monitoring section to domainmanager section if we are talking about this example?

Yes. It should be

canton {
    domain-managers {
        domainmanager {
            storage = ${_shared.storage}
            storage.config.properties.databaseName = "db"
            init.domain-parameters.unique-contract-keys = false
            replication.enabled = true
            admin-api {
                port = 4001
                address = 0.0.0.0
            }
            monitoring {
                health {
                    server {
                        address = 0.0.0.0
                        port = 8001
                    }
                    check.type = is-active
                }
            }
        }
    }

@bernhard
it does not work.
I am getting error:

Unknown key monitoring

2023-01-27 14:42:58,580 [main] INFO  c.d.canton.CantonEnterpriseApp$ - Starting Canton version 2.5.1
2023-01-27 14:42:59,976 [main] INFO  c.d.canton.CantonEnterpriseApp$ - Config field at storage.max-connections is deprecated. Please use storage.parameters.max-connections instead.
2023-01-27 14:43:00,148 [main] ERROR c.d.canton.CantonEnterpriseApp$ - GENERIC_CONFIG_ERROR(8,0): Cannot convert configuration to a config of class com.digitalasset.canton.config.CantonEnterpriseConfig. Failures are:
  at 'canton.domain-managers.domainmanager.monitoring':
    - (/canton/data/domainmanagers/domainmanager.conf: 13) Unknown key.
 err-context:{location=CantonConfig.scala:1467}
2023-01-27 14:43:00,168 [main] ERROR c.d.canton.CantonEnterpriseApp$ - An error occurred after parsing a config file that was obtained by merging multiple config files. The resulting merged-together config file, for which the error occurred, was written to '/tmp/canton-config-error-8203431665000491271.conf'.

Hi @Maksym_Zhovanyk , I was actually way off about how this works. Monitoring in general is done per process, not per node, which is why it sits below the canton namespace directly. But the is-active setting is per node since it reflects whether the node is active or not in a HA setup.

A setup where multiple processes of a single HA setup run in the same process is currently not well supported. You can still set the is-active, but you have to specify which of the nodes in the process to report on. Eg

canton {
  monitoring {
        health {
            server {
                address = 0.0.0.0
                port = 8001
            }
            check {
                type = is-active
                node = mydomain
            }
        }
  }

  participants {
    participant1 {
      storage.type = memory
      admin-api.port = 5012
      ledger-api.port = 5011
      monitoring {
        health {
            server {
                address = 0.0.0.0
                port = 8002
            }
            check {
                type = is-active
            }
        }
      }

    }
    participant2 {
      storage.type = memory
      admin-api.port = 5022
      ledger-api.port = 5021
      monitoring {
        health {
            server {
                address = 0.0.0.0
                port = 8003
            }
            check {
                type = is-active
            }
        }
      }
    }
  }
  mediators {
    mymediator {

      admin-api {
        address = localhost
        port = 3031
      }
    }
  }
  sequencers {
    mysequencer {

      public-api {
        address = localhost
        port = 3010
      }

      admin-api {
        address = localhost
        port = 3011
      }
    }
  }

  domain-managers {
    mydomain {

      admin-api {
        address = localhost
        port = 3001
      }

    }
  }
}

Note that the activeness of participants, mediator, domain manager and sequencer are all independent from each other, though. They can fail over as individual nodes. So the activeness endpoint won’t be that useful.

We are working to improve the health checks in general, but in the meantime, you should run one node per process for HA setups.

As an addendum, note that every node exposes activeness through its status API: canton/status_service.proto at d29737615cd7f8cf08ac5411f07a52f12ad3c803 · digital-asset/canton · GitHub