Questions Regarding Canton Tracing

Hello,
The general goal is to track latency for every requests in all different spans through out the entire system with Canton, I wonder what is the best approach.
While reading through Monitoring — Daml SDK 2.3.4 documentation I found myself still confused with below questions

  1. We can see that tracing can be done with Jaeger and ZipKin. Is it possible to integrate tracing with other tools? E.g. Datadog, AppDynamics?
  2. How can we pass traceId into Canton so that we can have the same trace shared between other applications and Canton, Or is that not possible?
  3. Is there any documentation or example project that we can see what spans and events are available from canton and what are not? Or further more how to make use of log to search for the traces that we cared for?

Thanks very much in advance!

Hi Judy_Wu

  1. Jaeger and ZipKin are the most common tracing backends used in the industry. We do not explicitly support other tracing tools, but most of the commercial visualisation tools also support tooling to translate Jaeger or Zipkin format into their own proprietary format. For example Splunk supports the integration using a special tool they mention in their documentation: Jaeger gRPC — Splunk Observability Cloud documentation. In the future we are planning to have other backends too.
  2. It all depends on the concrete workflow you would like to trace. If workflow is originated from the Canton system itself - trace is being auto-generated by the Canton. In case of use of canton through APIs - consumer might provide tracing information through the trace context. We use W3C Recommendation. More information can be found here: Trace Context.
  3. Unfortunately tracing is not well documented yet, but we are working on this. Relevant work can be tracked in this GH issue: Improve public accessibility to our trace tooling · Issue #14256 · digital-asset/daml · GitHub. The logging of the trace-id is enabled by default. Traces will be visible in the Zipkin/Jaeger UI with readable span names and are generally well connected. Furthermore trace-id can be copied from the UI and correlate it with log lines from the log file. Transaction Submission is a workflow that is especially detailed with spans and attributes across the trace.

Let me know if you have more questions - I will be happy to help.

3 Likes

Thanks so much @Sergey_Kisel_DA1 for the response.
After reading through the response and did some preliminary digging, I still have the below questions, hope someone can help me to answer :pray:

  1. It’s mention above that we follow the W3C recommendation, to pass in traceId/trace context, I wonder if we have an example headers for this, e.g.pass trace Id within header.
    Also Document how to access the distributed trace ID in the Java bindings · Issue #14258 · digital-asset/daml · GitHub seems to imply that there’s a way to access traceId. Does it mean java binding library support passing/accessing traceId? If so, can we have an example of how to access and pass in external traceId with java bindings library? (I was trying to do some digging, but didn’t find any appropriate api for traceId)
  2. Regarding how to map transaction to traceId, is traceId to commandId one to one mapping? If it is, which canton log line is recommend to look at for such mapping, is it at INFO level?

Hello @Judy_Wu,

  1. Usually you don’t want to provide your own trace id, as they generally get created by the distributed application at the start of the specific workflow/trace. From then you can simply take a look at the tracing UI (Jaeger or Zipkin) and see the traces that got created and it shows you the trace ids there, which can generally be correlated with trace ids show in logs.
  2. There are different kinds of traces, some traces are not related to command submission at all. But if you’re talking about command submission traces then yes, each trace (and trace id) corresponds to one command id and if you look at the tracing UI (Jaeger or Zipkin) at the span whose name starts with CantonSyncService, it will have the command id as part of the tags/attributes.
1 Like

Thanks @danilofaria for jumping in! and pointing out that commandId to traceId mapping can be easily found on Jaeger UI!
The reason that we need to pass in traceId to canton, is because canton is viewed as one component in the whole flow, when issues happen, we will need to look at the traceId for the entire flow which includes other components (e.g. ledger client) and see where the delay is from.
After discussing with @Raymond_Roestenburg (thank you so much @Raymond_Roestenburg, you are de savior), I think using interceptor along with rx java binding library can be the way to go. (interceptor: opentelemetry-java-docs/HelloWorldClient.java at 5f3002e75238674ea97402c44fecf9afc03a6c9b · open-telemetry/opentelemetry-java-docs · GitHub)

NettyChannelBuilder ncb = NettyChannelBuilder.forAddress(host, port).intercept(new TraceInterceptor());
DamlLedgerClient ledger = DamlLedgerClient.newBuilder(ncb).build();