Canton Participant1 Health Ping Failure on Raspberry Pi v3

Hi Canton Team,

Working through Getting Started — Canton 0.26.0 documentation on a Raspberry Pi. So far the Ledger runs stable, can run the Sandbox. Basic tests are OK.

However an issue arose when executing participant1.health.ping(participant2).

Prior to this the Health Status command worked as per spec:

@ health status
@ health.status 
res8: CantonStatus = Status for Domain 'mydomain':
Domain id: mydomain::12205eb22b829e791330727259e6e5770c58b39d5e71089922dcf18ebfa4215ae9be
Uptime: 1h 37m 58.999439s
Ports: 
	public: 5018
	admin: 5019
Connected Participants: None

Status for Participant 'participant1':
Participant id: PAR::participant1::12205b3ea0b61e84ce053da36b3b7165b7664f158aae2e4b4622fbe552041a15852c
Uptime: 1h 37m 18.993575s
Ports: 
	ledger: 5011
	admin: 5012
Connected Domains: None
Active: true

Status for Participant 'participant2':
Participant id: PAR::participant2::122013bceeeb0ab3e3ea30fab5779b00e24804f3cf1a474b95b45307295721eccc3c
Uptime: 1h 36m 52.633865s
Ports: 
	ledger: 5021
	admin: 5022
Connected Domains: None
Active: true

When the Health Status command was executed, CPU usage was almost 300% on 4 x 1.4Ghz cores.

Ping failure output below:

participant1.health.ping(participant2)
@ participant1.health.ping(participant2) 
WARN  c.d.c.p.a.PingService:participant=participant1 tid:d4a306084b56a450cf193fe9bc348534 - Ping/pong with id=2d16ba26-4753-4453-a6d2-671bd4657357-ping failed with reason Failed(
  Completion(
    Status(PERMISSION_DENIED, NotConnectedToAnyDomain(CN10051-2): This participant is not connected to any domain.; participant=participant1),
    commandId = '2d16ba26-4753-4453-a6d2-671bd4657357-ping-85495b8c-96d7-4ad4-8723-0b8bd041ad47'
  )
).
ERROR c.d.c.e.CommunityConsoleEnvironment - Unable to ping PAR::participant2::122013bceeeb0ab3e3ea30fab5779b00e24804f3cf1a474b95b45307295721eccc3c within 10000ms
Command execution failed.

When the Ping command was executed, CPU utilisation was about 200% on 4 x 1.4Ghz cores, while MEM utilisation jumped to 680M/973M and stayed there. 10 minutes after executing the command, the MEM has only dropped back to 602M/973M, and the command still fails.

Questions

  • Is this issue likely to be a reduced MEM footprint in the RPi v3 (1Gb), which causes the Ping command to timeout ( T > 10000ms)?
  • If so, if there a option to extend or ignore the T > 10000ms limit?
  • Or can I assign a larger slice of MEM for Java to access immediately?

Screenshot of HTOP with Java/Canton process tagged below:

1 Like

Judging from the error it looks like you are missing the following two lines from the docs you linked to, to connect the participants to the domain:

participant1.domains.connect_local(mydomain)
participant2.domains.connect_local(mydomain)

You can use the health command to see the current status

health.status
1 Like

Whoops, fixed, I overlooked that:

Whoops_My_Bad_2021-08-04_18-50-08

However, the first connecting command worked, with the exception of some time delay warning. Meanwhile the second command was caused a cascade of Red, highlights below:

participant2.domains.connect_local(mydomain) FAIL
  scala.concurrent.Await$.$anonfun$result$1(package.scala:223)
  scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:57)
  scala.concurrent.Await$.result(package.scala:146)
  com.digitalasset.canton.console.commands.WaitUntilPackagesReady.waitUntilPackagesReady(WaitUntilPackagesReady.scala:64)
  com.digitalasset.canton.console.commands.WaitUntilPackagesReady.waitUntilPackagesReady$(WaitUntilPackagesReady.scala:21)
  com.digitalasset.canton.console.ParticipantReference.waitUntilPackagesReady(InstanceReference.scala:347)
  com.digitalasset.canton.console.commands.ParticipantAdministration$packages$.wait_until_ready(ParticipantAdministration.scala:701)
  com.digitalasset.canton.console.ParticipantReferencesExtensions$packages$.$anonfun$wait_until_ready$1(ParticipantReferencesExtensions.scala:51)
  com.digitalasset.canton.console.ParticipantReferencesExtensions$packages$.$anonfun$wait_until_ready$1$adapted(ParticipantReferencesExtensions.scala:51)
  scala.collection.Iterator.foreach(Iterator.scala:943)
  scala.collection.Iterator.foreach$(Iterator.scala:943)
  scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
  scala.collection.IterableLike.foreach(IterableLike.scala:74)
  scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  com.digitalasset.canton.console.ParticipantReferencesExtensions$packages$.wait_until_ready(ParticipantReferencesExtensions.scala:51)
  com.digitalasset.canton.console.commands.ParticipantAdministration$domains$.reconnect(ParticipantAdministration.scala:1081)
  com.digitalasset.canton.console.commands.ParticipantAdministration$domains$.connectFromConfig(ParticipantAdministration.scala:1004)
  com.digitalasset.canton.console.commands.ParticipantAdministration$domains$.connect_local(ParticipantAdministration.scala:984)
  ammonite.$sess.cmd13$.<init>(cmd13.sc:1)
  ammonite.$sess.cmd13$.<clinit>(cmd13.sc)


WARN  c.d.c.c.ExecutionContextMonitor - Execution context canton-env-execution-context is stuck or slow. My scheduled future has not been processed for at least 3 seconds (queue-size=22).
ForkJoinIdlenessExecutorService-canton-env-execution-context: java.util.concurrent.ForkJoinPool@e341b7[Running, parallelism = 4, size = 9, active = 6, running = 6, steals = 1339694, tasks = 8, submissions = 22]
WARN  c.d.c.c.ExecutionContextMonitor - Execution context canton-env-execution-context is just slow. Future got executed in the meantime.
WARN  c.d.c.c.ExecutionContextMonitor - Execution context canton-env-execution-context is just slow. Future got executed in the meantime.
@ WARN  c.d.c.c.ExecutionContextMonitor - Execution context canton-env-execution-context is just slow. Future got executed in the meantime.
WARN  c.d.c.c.ExecutionContextMonitor - Execution context canton-env-execution-context is just slow. Future got executed in the meantime.
WARN  c.z.h.p.HikariPool - daml.index.db.connection.indexer - Thread starvation or clock leap detected (housekeeper delta=51s244ms346µs804ns).

Fatal Error:

java.lang.OutOfMemoryError
# Aborting due to java.lang.OutOfMemoryError: Java heap space
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (debug.cpp:338), pid=1583, tid=1856
#  fatal error: OutOfMemory encountered: Java heap space
#
# JRE version: OpenJDK Runtime Environment (11.0.12+7) (build 11.0.12+7-post-Raspbian-2deb10u1)
# Java VM: OpenJDK Server VM (11.0.12+7-post-Raspbian-2deb10u1, mixed mode, serial gc, linux-)
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/dietpi/opt/canton-community-0.26.0/hs_err_pid1583.log
#
# If you would like to submit a bug report, please visit:
#   Unknown
#
Aborted

So, there you have it, it’s a JAVA MEM issue … :man_facepalming:t2:

1 Like

Hmm … it shouldn’t be super resource hungry, but yeah, if you run JVM & Postgres on a Raspberry Pi I guess you might run into limits.

One potential way would be to not use Postgres but use file-based H2 (examples/03-advanced-configuration/storage/h2.conf) for storage and let the JVM use the freed up resources. File-based H2 is only meant for demos / prototypes, as we focus our testing capacity on SQL based storage.

Here is how you start Canton using file-based H2:

./bin/canton -c examples/03-advanced-configuration/storage/h2.conf -c examples/03-advanced-configuration/nodes/participant1.conf -c examples/03-advanced-configuration/nodes/domain.conf
2 Likes

OK, good suggestion, thank you.

So I could leave PostgreSQL enabled & installed, and force Canton to use the H2 option only?

I did notice that the resources used by PostgreSQL without any DB I/O was negligible but I will do as you suggest. Also at the bottom of the Java Heap error output, it suggest enabling the Java Core Dump reporting using ulimit -c unlimited which I will do after a reboot.

Might as well flush the DRAM, Cache etc and start fresh.

1 Like

Rebooted. Then modified all the System limits that are allowable under the current OS:

Data, memory, nofiles, stack and corefiles should be set to unlimited.
    ulimit -d unlimited
    ulimit -m unlimited
    ulimit -n unlimited
    ulimit -s unlimited
    ulimit -c unlimited

Reference: Recommended ulimit settings for linux

Two modifications not allowed:

dietpi@rvnmn02:~$ ulimit -n unlimited
-bash: ulimit: open files: cannot modify limit: Operation not permitted
dietpi@rvnmn02:~$ ulimit -c unlimited
-bash: ulimit: core file size: cannot modify limit: Operation not permitted

Edit /etc/sysctl.conf

sys.kernel.threads-max = 999999
sys.kernel.pid_max = 999999

Reference: https://newbedev.com/how-to-increase-maximum-number-of-jvm-threads-linux-64bit

Hard Reboot, then reran the sequence, and this time the Ping failed on the cmd participant1.health.ping(participant2) but not on the cmd participant2.health.ping(participant1) :fearful:

Although the response time was 5074 milliseconds

I’ll reboot, and this time execute the H2 string as you stated above.

EDIT: None of those helpful suggestions remediate the fact that the RPiv3 has insufficient RAM, CPU and I/O speeds to be useful when running a basic Java application.

My new 8GB RPiv4 arrived yesterday, and this is now my Desktop, the 8 Core Desktop machine is now solely repurposed as a Daml Ledger (Canton/PostgreSQL) machine using Ubuntu Server 20.04.

RPis are good but have hard, functional limits.

1 Like

Would love to hear once you have it up and running, if you’ve been able to get Canton running stable on it

1 Like

Hi @Shaul Canton Community version 0.27.0 is running on the previous 8 Core Desktop (Actually it was never really a Desktop, it was a BTC Mining rig).

I will make a simple Daml project today and then write a script to stress test the server for personal interest,

Re the 8GB RPiv4, I failed to realise that the Daml SDK will not run on ARMv7 … before I purchased it :man_facepalming:t2: :frowning_face: