Is there a way to get the AST of Daml source code?

For Haskell, there are libraries, e.g. this one:haskell-tools-ast: Haskell AST for efficient tooling

Maybe I can use such a library with some conversions?

Interested in how you would plan to use this data if you could get it. I know that some tool vendors use this for SAST scanning.

I do Daml metaprogramming, meaning rewriting and generating code.

The ideal starting point for this is when all imports are qualified, because in this case we don’t have to rely on external data (implicit imports) to handle the code and don’t have to bother about different import syntaxes.

I wrote a script for converting all imports into qualified imports, but it’s not perfect. E.g. it cannot distinguish between an imported function name and a local variable, if the token is the same (see the “Qualified name in binding position” error message).

I can extract some information about the package using the DAML LF API Type Signature package, but that only contains information about the record and variant types, nothing about functions and classes.

So I want to refine my script and my guess is that the AST of the source code would help (by AST I mean not the parse tree, this is the source of my understanding about the difference between the AST and the parse tree: Abstract syntax tree - Wikipedia).

There isn’t any tooling for Daml that gives you an AST unfortunately.

1 Like

Like @cocreature says, there is no tooling for this at the moment. But, to throw in my two cents:

There are a few different “ASTs” for Daml - there is the AST which represents loosely the Haskell to which Daml code desugars, and then there is the AST representing the underlying executable lambda calculus that we use (called Daml-LF). Depending on what you want, one of these might be better. There is no AST for Daml code proper, all Daml code is pretty much parsed into a Haskell AST directly.

I assume for the qualified imports use case, the Haskell AST would be closest to what you needed.

As a pointer if you wanted to implement something like this yourself, there’s a good deal of prior work in GitHub - digital-asset/daml: The Daml smart contract language and GitHub - digital-asset/daml-ghcide: daml fork of haskell/ghcide, ultimately relying on GitHub - digital-asset/ghc: Fork of GHC (https://gitlab.haskell.org/ghc/ghc.git).

To understand the most basic form of parsing, inside the daml repo, in compiler/damlc/daml-desugar/src/DA/Daml/Desugar.hs there is a function desugar which takes source, parses it, and pretty prints the parsed result. Refer to execDesugar and cmdDesugar in compiler/damlc/lib/DA/Cli/Damlc.hs to see how it is executed.

To understand later phases (typechecking, LF generation) a decent entry point would be execBuild in compiler/damlc/lib/DA/Cli/Damlc.hs and buildDar in compiler/damlc/daml-compiler/src/DA/Daml/Compiler/Dar.hs. That’s how daml build does dar generation.

From this point on we use rules that we’ve defined in a build system called Shake. buildDar uses our GeneratePackage rule - this dispatches to GenerateDalf, when then goes to GenerateRawDalf, which finally calls generateCore (the GenerateCore rule) from the daml-ghcide repo. Staying within the daml-ghcide repo, generateCore uses the Typecheck rule (produces TypecheckedModule), which uses the GetParsedModule rule (produces ParsedModule). The TypecheckedModule contains a lot more information about names because it runs the renaming phase - depending on what you need, the TypecheckedModule or ParsedModule could be appropriate.

Keep in mind this is all internally-facing code - we don’t guarantee it will remain stable.

1 Like

Thank you, Dylan, I will check these out. I’ve just seen the Deep Dive into the Daml Compiler video, these links will lead me into an even deeper dive into the topic.