The general goal is to track latency for every requests in all different spans through out the entire system with Canton, I wonder what is the best approach.
While reading through Monitoring — Daml SDK 2.3.4 documentation I found myself still confused with below questions
- We can see that tracing can be done with Jaeger and ZipKin. Is it possible to integrate tracing with other tools? E.g. Datadog, AppDynamics?
- How can we pass traceId into Canton so that we can have the same trace shared between other applications and Canton, Or is that not possible?
- Is there any documentation or example project that we can see what spans and events are available from canton and what are not? Or further more how to make use of log to search for the traces that we cared for?
Thanks very much in advance!
- Jaeger and ZipKin are the most common tracing backends used in the industry. We do not explicitly support other tracing tools, but most of the commercial visualisation tools also support tooling to translate Jaeger or Zipkin format into their own proprietary format. For example Splunk supports the integration using a special tool they mention in their documentation: Jaeger gRPC — Splunk Observability Cloud documentation. In the future we are planning to have other backends too.
- It all depends on the concrete workflow you would like to trace. If workflow is originated from the Canton system itself - trace is being auto-generated by the Canton. In case of use of canton through APIs - consumer might provide tracing information through the trace context. We use W3C Recommendation. More information can be found here: Trace Context.
- Unfortunately tracing is not well documented yet, but we are working on this. Relevant work can be tracked in this GH issue: Improve public accessibility to our trace tooling · Issue #14256 · digital-asset/daml · GitHub. The logging of the trace-id is enabled by default. Traces will be visible in the Zipkin/Jaeger UI with readable span names and are generally well connected. Furthermore trace-id can be copied from the UI and correlate it with log lines from the log file. Transaction Submission is a workflow that is especially detailed with spans and attributes across the trace.
Let me know if you have more questions - I will be happy to help.
Thanks so much @Sergey_Kisel_DA1 for the response.
After reading through the response and did some preliminary digging, I still have the below questions, hope someone can help me to answer
- It’s mention above that we follow the W3C recommendation, to pass in traceId/trace context, I wonder if we have an example headers for this, e.g.pass trace Id within header.
Also Document how to access the distributed trace ID in the Java bindings · Issue #14258 · digital-asset/daml · GitHub seems to imply that there’s a way to access traceId. Does it mean java binding library support passing/accessing traceId? If so, can we have an example of how to access and pass in external traceId with java bindings library? (I was trying to do some digging, but didn’t find any appropriate api for traceId)
- Regarding how to map transaction to traceId, is traceId to commandId one to one mapping? If it is, which canton log line is recommend to look at for such mapping, is it at INFO level?
Thanks @danilofaria for jumping in! and pointing out that commandId to traceId mapping can be easily found on Jaeger UI!
The reason that we need to pass in traceId to canton, is because canton is viewed as one component in the whole flow, when issues happen, we will need to look at the traceId for the entire flow which includes other components (e.g. ledger client) and see where the delay is from.
After discussing with @Raymond_Roestenburg (thank you so much @Raymond_Roestenburg, you are de savior), I think using interceptor along with rx java binding library can be the way to go. (interceptor: opentelemetry-java-docs/HelloWorldClient.java at 5f3002e75238674ea97402c44fecf9afc03a6c9b · open-telemetry/opentelemetry-java-docs · GitHub)
NettyChannelBuilder ncb = NettyChannelBuilder.forAddress(host, port).intercept(new TraceInterceptor());
DamlLedgerClient ledger = DamlLedgerClient.newBuilder(ncb).build();
Hello there, I have a follow up question regarding tracing again.
When we are using java Daml ledger client
TransactionClient to get Transaction it does not seem possible to get traceId within any of the api.
Q1: In this case, if different systems communicate only through ledger, how can we pass around one traceId?
Q2: Assume if we can set commandId to the same as traceId, how can we get commandId/traceId when we are using interceptor. (As we want to create a new span reacting to the transaction we heard on ledger under the same traceId)
Great thanks to @danilofaria for pointing me directions.
I will note down our offline discussion in case anyone has the same question in the future
For Q1: There’s currently no way to get Trace Context (e.g. traceId, parent span) from Daml Transaction. Yet looks like this might change in future, that Trace Context might be added into Transaction (@danilofaria @Ratko_Veprek if you can help to link to any existing github issues or related discussion, please do, thanks!)
For Q2: We can pass existing traceId/parent span to the interceptor with the help of OpenTelemetry context. Can see Manual Instrumentation | OpenTelemetry for details, or open telemetry - creating Opentelemetry Context using trace-id and span-id of remote parent - Stack Overflow for quick example.
As for Trace Interceptor, as suggested by @danilofaria there’s existing ones in OpenTelemetry GRPC library opentelemetry-java-instrumentation/TracingClientInterceptor.java at 597b2a53211633a77df229d25ee28082bd3dc559 · open-telemetry/opentelemetry-java-instrumentation · GitHub
Which can be used by simply adding the library in dependency and do
NettyChannelBuilder ncb = NettyChannelBuilder