Hello,
I am working on a DAML project and recently switched to an M1 based mac.
Running a docker compose configuration with the daml sandbox and multiple other services I noticed that the daml sandbox command in the entry point of my container throws a segmentation fault.
We do not currently provide M1-ready Docker images. Your best bet at this point is to build them yourself; as a starting point, the Dockerfile for these images is here.
I am facing the same segmentation issue. I have tried to build a docker image . I updated the docker file, basically changed the first line
FROM eclipse-temurin:11 -> FROM arm64v8/eclipse-temurin:11
And I have the same issue with both versions of eclipse.
I also tried interactively installing SDK
atttempt 1 : arm64v8/eclipse-temurin:11
ashokraj@ashokrajwhx5j3 ~ % docker run -it --entrypoint sh arm64v8/eclipse-temurin:11
#
# curl -sSL https://get.daml.com/ | sh
Determining latest SDK version...
Latest SDK version is 2.2.0
Downloading SDK 2.2.0. This may take a while.
######################################################################## 100.0%
Extracting SDK release tarball.
/tmp/tmp.SkEM23Mdu1/sdk/install.sh: line 5: 34 Segmentation fault "$DIR/daml/daml" install "$DIR" $@
FAILED TO INSTALL!
# exit
Attempt 2 with eclipse-temurin:11
ashokraj@ashokrajwhx5j3 ~ % docker run -it --entrypoint sh eclipse-temurin:11
Unable to find image 'eclipse-temurin:11' locally
11: Pulling from library/eclipse-temurin
Digest: sha256:e41c0ff25711398daa22288822866e51818707431edd6e515752951b7dd2dcad
Status: Downloaded newer image for eclipse-temurin:11
# curl -sSL https://get.daml.com/ | sh
Determining latest SDK version...
Latest SDK version is 2.2.0
Downloading SDK 2.2.0. This may take a while.
######################################################################## 100.0%
Extracting SDK release tarball.
/tmp/tmp.h21YjZqR4F/sdk/install.sh: line 5: 34 Segmentation fault "$DIR/daml/daml" install "$DIR" $@
FAILED TO INSTALL!
Am i doing something wrong here? What is the solution?
Note: Chip is Apple M1 pro and Docker is the one for Mac with apple chip.
Couple of possibilities here. Daml SDK with VS Code integration is most likely failing as there are amd64 binaries in the SDK for IDE integration. Almost certain that Docker does not handle virtualization within a Docker images.
Following @sormeter dvice, I added this env variable to make sure the platform is always linux/amd64 export DOCKER_DEFAULT_PLATFORM=linux/amd64
and my docker compose has two nodes - one for canton and another for SDK.
The SDK one is still throwing the segmentation fault.
Here is my docker compose
version: '3'
services:
connect.node:
image: digitalasset/canton-open-source:latest
ports:
- "4011:10011"
- "4012:10012"
tty: true # map output to terminal
environment:
- CANTON_ALLOCATE_PARTIES=alice;bob
- CANTON_CONNECT_DOMAINS=mydomain#http://localhost:10018
command: ["daemon",
"-c" , "${CANTON_CONFIG:-config/participant.conf,config/domain.conf}",
"--bootstrap=config/bootstrap.canton"]
volumes:
- ./config:/canton/config
connect.navigator:
# This image works ok
image: digitalasset/daml-sdk:2.1.1
# Navigator will not load the parties with this image
# image: digitalasset/daml-sdk:2.1.0-snapshot.20220325.9626.0.4a483381
# This command works with the snapshot but not the 2.0.0 release
# command: sh -c "daml ledger navigator --host connect.node --port 10011 --port 4000"
# We have to use this command with the 2.0.0 release but even then the navigator will not show the parties
command: sh -c "daml navigator server connect.node 10011"
ports:
- "4000:4000"
depends_on:
- connect.node
links:
- connect.node
And here is the error am getting
â ż Container simplestdeployment-connect.node-1 Recreated 0.1s
â ż Container simplestdeployment-connect.navigator-1 Recreated 0.1s
Attaching to simplestdeployment-connect.navigator-1, simplestdeployment-connect.node-1
simplestdeployment-connect.navigator-1 | Segmentation fault
simplestdeployment-connect.navigator-1 exited with code 139
I’m a bit confused by this last message. It seems to say things work with 2.1.1, but not prior versions. Is that the case? If so, what is preventing you from using 2.1.1 (or better yet, 2.2.0)?
I’ve been struggling with the same issue and have tried all similar approaches but for some reason it keeps segfaulting. DAML runs fine natively on osx so what could be the issue here? If I use full arm stack or a full x86 stack with rosetta it should work both ways. What can we do to debug this further and see where the issue is and try to get around it?
This thread is starting to be a bit confusing. Would you mind opening a new thread that explains precisely what you have tried that you expected to work and how it failed instead? Please include all relevant version numbers if possible.
Docker on MacOS uses a (hidden) Linux VM to run the containers and therefore expects a linux/ image. The daml-sdk are only linux/amd64. So my assumption is that there is some issue with Docker and the translation from linux/amd64 to linux/arm64 that causes the seg fault. I think Docker uses QEMU and depending on which emulation is installed this may not be 100% compatible.
I have tried on Podman as well so this seems like a fundamental issue with the QEMU emulator:
So this issue is likely that whilst you are creating a linux/arm64 base image the binaries pulled down via the get.daml.com are really linux/amd64. You might confirm this but running your image in a bash shell and then doing file on the binaries.
That could be a good idea. I guess most of us were under the impression that it would detect the architecture and pull down the right versions since it worked natively on osx. I’ll give it a shot.
So I looked at the install script get get.daml.com and it appears that it only checks the version and OS to install for (windows, linux, mac) and then pulls that from github releases (Releases · digital-asset/daml · GitHub).
Based on how the linux version is ran I checked both these files:
$ file daml/lib/ld-linux-x86-64.so.2
daml/lib/ld-linux-x86-64.so.2: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), static-pie linked, BuildID[sha1]=db50353a26600bb848b9a5541b1506e0a24cb34b, not stripped
$ file daml/lib/daml
daml/lib/daml: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter ld-undefined.so, for GNU/Linux 2.6.32, stripped
So it seems like there is no explicit ARM support even in OSX? I was expecting to see a Universal binary for osx but I guess even on native osx it must be using Rosetta…?
We currently only publish Intel versions of the SDK (amd64/x86_64).
On M1 macs, the SDK runs through Rosetta. You can easily see that in daily use by the delay there is when a new version of the SDK is run for the first time (that’s the time it takes Rosetta to “translate” the binary). Or, of course, using file on the binaries under ~/.daml.
We do not currently publish any arm64 binaries for any platform. So far our experience has been that the Intel SDK works well under Rosetta, and the use-case for an SDK Docker image remains elusive. Demand for arm64 support on Linux is also very low outside of the Docker use-case.
Although I would say the use case for docker in general is very strong.
We have existing docker-compose configurations that we’re using for all of our workflows for our other products which really unifies and streamlines our development. We use the same workflow to deploy on our shared instances, ci pipelines, as we do locally via compose. It automates and encapsulates our workflow instead of relying on local laptop state.
I think the next question to figure out is how do we debug why it’s segfualting inside of docker but not natively on osx. A lot of software has gone through this similar issue with osx’s move to arm so it shouldn’t be too hard to figure out. Rosetta is kicking in and working for the rest of the container and all binaries inside of it but something is failing when it comes to Daml.
Any idea how we could capture a stack trace or something along those lines to figure out exactly which instructions its failing on?
Thanks for attention on this so far! I was worried we’d never get any traction on this so I’m happy to see some dialogue here.
My understanding is that Docker runs in a stealthy Linux VM, and therefore Rosetta has nothing to do with it. I assume it is segfaulting because it is a Linux amd64 binary running on a Linux arm64 (virtual) machine.
It’s a Haskell binary and the Haskell runtime is not big on stack traces as far as I’m aware. I’m not a proficient Haskell developer myself so there may be techniques I’m not aware of. But I don’t think this is the right direction of investigation; I think what we’d want here is to build the SDK for Linux arm64. We don’t currently have that setup (we do not have Linux arm64 machines as part of our CI fleet and Haskell (GHC) does not, as far as I know, do cross compilation), so I can’t comment on how hard or easy it would be. Maybe it’s just a matter of running the normal build on an arm64 Linux machine; maybe that won’t work at all and there are lots of changes to make.
Can you elaborate on that? Specifically, why do you need the SDK as part of that workflow?