Canton Domain - Health Checks

Brian_Weir · January 20, 2022, 2:53pm

Hi Team,

Checking the Canton documentation here, I’ve been able to enable health checks for multiple Canton Participant nodes.

However, the documentation do not mention health checks for Canton Domains. Checking the Scaladoc’s for the health monitoring API (see here), it seems we only have configuration for checking participant nodes. I’ve worked around this by utilising netcat to check that the public port is available on the Canton Domain node.

However, it would be useful to expose a configurable http endpoint as part of a Canton Domain setup.

Brian

Phoebe_Nichols · January 20, 2022, 5:40pm

Hi Brian,

The health check uses a “ping” Daml transaction (if configured to use a ping-based check, like in the docs). All Daml transactions run by Canton have to go through a Canton domain, so if you’re only using one domain then the ping is going to check the health of that domain.

I’ve worked around this by utilising netcat to check that the public port is available on the Canton Domain node.

I’d be interested to know why you need to check the public port is available. Is it for debugging, to check that a running system is still healthy, or perhaps for coordinating a deployment? There might be something else to help.

Brian_Weir · January 20, 2022, 7:23pm

Hi Phoebe,

Thanks for your reply.

In my use case we are running Canton in a Kubernetes environment, with the domain and multiple participant nodes all running in their own segregated container (also running in their own dedicated pod). Kubernetes needs to know the health of a container in order to keep the target running state for a given configured service. As the domain is running separate from the participants, I need a way to let Kubernetes know if each running container is healthy or not. Kubernetes uses what’s called a livenessProbe configuration in order to check if a container is healthy and I’ve configured this to run a netcat -z <domain_host> <domain_public_port>. I took the opinion that the public port is more critical to monitor for a running Canton network than the admin port as participants connect to the public port.

This domain configuration is running thus far as expected - its just the canton solution provided for participants is very convenient and exposes a very useful HTTP endpoint -

monitoring.health {
    server {
      address = 0.0.0.0
      port = 10019
    }

    check {
      type = ping
      participant = hubparticipant1
      interval = 30s
    }
  }

Phoebe_Nichols · January 20, 2022, 8:18pm

Hi Brian,

The health endpoint is associated with a Canton process rather than with any specific Canton node. The docs you linked show configuring a ping-based health check, which requires a participant node, but you can also configure an “always-healthy health check” that will return a 200 as long as the health check service is up (docs).

You should be able to add a health endpoint to your domain deployment, as long as you use the always-healthy check. This doesn’t give you much more information than checking whether the public API port is up, but it might make your deployment a bit simpler.

The config for the always-healthy check should look like this:

  canton {
    monitoring.health {
     server {
        port = 7000
     }

     check {
       type = always-healthy
     }
  }

Is this the kind of thing that you’re looking for?

I don’t think there’s a more comprehensive way to check that domain components are behaving correctly in isolation at the moment – the standard check is performing a ping from a participant.

Brian_Weir · January 21, 2022, 8:30am

Thanks a lot @Phoebe_Nichols - I think the always-healthy will solve my problem

I do think it would be rather useful to be able to check the individual components health in isolation as I can envisage production running domains/participants separating the services in their own containers to improve HA and allow for horizontal scaling (ie, the domains sequencer).

Topic		Replies	Views
Canton - Domain Manager Health Check Questions canton	8	371	January 30, 2023
Health check for remote Daml Canton Mediator Questions canton	8	258	January 10, 2023
Canton Participant1 Health Ping Failure on Raspberry Pi v3 Questions canton , ping , timeout	7	550	September 15, 2021
Getting NOT_CONNECTED_TO_ANY_DOMAIN in kubernetes after a couple of hours of the infrastructure running Questions canton	7	644	December 21, 2022
Canton Participant to Domain keep alive logic Questions	3	256	September 5, 2022

Canton Domain - Health Checks

Related topics